jump to navigation

practical semantic web – creating a catalog of Linked data April 4, 2010

Posted by Anand Mallaya in computers, internet, technology, tutorial, web.
Tags: , , , , , , ,
trackback

Today I am going to work on creating a semantic web document. I am going to make a catalog of linked data datasets listed under linkeddata.org. Here is the list I am going to publish in RDF.

  1. choose the correct vocabulary – there are different generic vocabularies like Dublin Core, FOAF etc and specialized vocabulary like Dcat, Void etc. for creating catalogs. Dcat is designed for government data catalogs, so I choose VoID vocabulary, which is designed for single dataset provider. It uses generic vocabularies lke FOAF and DC as well.
  2. Select suitable tools - Tools to edit the RDF document. There are plenty of them ,in this case and RDF editor, like the Rhodonite tool for RDF editing and browsing. But I couldn’t understand it well because of poor documentation and help. So I chose an online VoID editor from DERI Galway. Though the result is in Turtle format, there are tools to convert turtle document to RDF/XML format. Like this one online : RDF Validator/converter rdfabout.com
  3. Creating the semantic grpah – first I am going to choose a dataset and add it to my catalog. To start with, I chose CrunchBase entry listed in linkeddata.org.  Go to VoID editor and add the following details in it
  4. Dataset URI:  http://cb.semsol.org/

    Dataset Homepage URIhttp://cb.semsol.org/

    Dataset Name:  Crunchbase

    Dataset Description:  RDFized Crunchbase entries

    Example Resource:  http://cb.semsol.org/company/yahoo

    Dataset Topic : business, database

    Vocabulary URIhttp://www.w3.org/1999/02/22-rdf-syntax-ns# , http://www.w3.org/2003/01/geo/wgs84_pos#,  http://cb.semsol.org/ns#

    Publisherhttp://semsol.com

    SPARQL endpoint : http://cb.semsol.org/sparql

    Now the dataset entry for cruchbase dataset is ready in VoID vocabulary, in the right side textarea. It is in Turtle notation.

    @prefix dcterms: <http://purl.org/dc/terms/&gt; .
    @prefix void: <http://rdfs.org/ns/void#&gt; .
    @prefix : <#> .
    ## your dataset
    <http://cb.semsol.org/&gt; rdf:type void:Dataset ;
    foaf:homepage <http://cb.semsol.org/&gt; ;
    dcterms:title “Crunchbase” ;
    dcterms:description “RDFized Crunchbase entries” ;
    dcterms:publisher <http://semsol.com&gt; ;
    void:sparqlEndpoint <http://cb.semsol.org/sparql&gt; ;
    void:vocabulary <http://cb.semsol.org/ns#&gt; ;
    void:exampleResource <http://cb.semsol.org/company/yahoo&gt; ;

    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt; .@prefix foaf: <http://xmlns.com/foaf/0.1/&gt; .@prefix dcterms: <http://purl.org/dc/terms/&gt; .@prefix void: <http://rdfs.org/ns/void#&gt; .@prefix : <#> .
    ## your dataset<http://cb.semsol.org/&gt; rdf:type void:Dataset ; foaf:homepage <http://cb.semsol.org/&gt; ; dcterms:title “Crunchbase” ; dcterms:description “RDFized Crunchbase entries” ; dcterms:publisher <http://semsol.com&gt; ; void:sparqlEndpoint <http://cb.semsol.org/sparql&gt; ; void:vocabulary <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; ; void:vocabulary <http://www.w3.org/2003/01/geo/wgs84_pos#&gt; ; void:vocabulary <http://cb.semsol.org/ns#&gt; ; void:exampleResource <http://cb.semsol.org/company/yahoo&gt; ; dcterms:subject <http://dbpedia.org/resource/Database&gt; ; dcterms:subject <http://dbpedia.org/resource/Business&gt; .

  5. Convert in to RDF/XML file(serialize) – copy the dataset details in turtle notation and go to RDF/XML converter tool. And paste the turtle notated content there and select input format as N-Triples/Turtle, and click validate. The result is given below.
  6. <?xml version=”1.0″?>
    <rdf:RDF xmlns:foaf=”http://xmlns.com/foaf/0.1/&#8221; xmlns:void=”http://rdfs.org/ns/void#&#8221;
    xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;
    xmlns:dcterms=”http://purl.org/dc/terms/”&gt;
    <void:Dataset rdf:about=”http://cb.semsol.org/”&gt;
    <foaf:homepage rdf:resource=”http://cb.semsol.org/&#8221; />
    <dcterms:title>Crunchbase</dcterms:title>
    <dcterms:description>RDFized Crunchbase entries</dcterms:description>
    <dcterms:publisher rdf:resource=”http://semsol.com&#8221; />
    <void:sparqlEndpoint rdf:resource=”http://cb.semsol.org/sparql&#8221; />
    <void:vocabulary rdf:resource=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221; />
    <void:vocabulary rdf:resource=”http://www.w3.org/2003/01/geo/wgs84_pos#&#8221; />
    <void:vocabulary rdf:resource=”http://cb.semsol.org/ns#&#8221; />
    <void:exampleResource rdf:resource=”http://cb.semsol.org/company/yahoo&#8221; />
    <dcterms:subject rdf:resource=”http://dbpedia.org/resource/Database&#8221; />
    <dcterms:subject rdf:resource=”http://dbpedia.org/resource/Business&#8221; />
    </void:Dataset>
    </rdf:RDF>

  7. Now repeat the above process and create entries for all the datasets listed in the Linkeddata.org.
  8. Combine the RDF entries in to a single file, no need to copy all the tags but starting from <void:Dataset .. >only
    something like
    <?xml version=”1.0″?>
    <rdf:RDF xmlns:foaf=”http://xmlns.com/foaf/0.1/&#8221; xmlns:void=”http://rdfs.org/ns/void#&#8221;
    xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;
    xmlns:dcterms=”http://purl.org/dc/terms/”&gt;
    <void:Dataset rdf:about=”http://cb.semsol.org/”&gt;


    </void:Dataset>
    <void:Dataset rdf:about=”http://dbpedia.org/”&gt;
    ….
    ….
    <void:Dataset>
    <void:Dataset rdf:about=”http://www.geonames.org/”&gt;


    <void:Dataset>
    </rdf:RDF>
  9. And the catalog is ready in RDF/XML with all the datasets added. Save it as a file with extension .rdf like Linked_data_catalog.rdf.  Now the machine can understand (if programmed wisely) to some extend what data are available there to convert in to meaningful information and then to  knowledge.
    You can find the catalog here: Linkeddata datasets catalog(note: not yet ready, comeback later )
About these ads

Comments»

1. woddiscovery - April 4, 2010

Very nice post, good analysis of the tools and well explained. Some remarks:

1. The voiD editor ve2 has a new location (http://lab.linkeddata.deri.ie/ve2/) and would be great if this change is reflected. I don’t plan to further maintain or update the version on the old location.

2. As I already mentioned on Twitter, I think it would be more beneficial if you submit the voiD files you create to a voiD store such as http://void.rkbexplorer.com/ (btw, you can directly announce voiD files to stores and indexer in ve2) rather than maintaining a single huge document on your own. The reasons are manifold: stores scale and are community-driven, offer a SPARQL endpoint and a look-up facility.

3. It’s interesting to see that quite a number of your proposed steps deal with low-level formatting/syntax issues (RDF/XML, etc.) which are actually rather irrelevant in the broader context.

Anyway, great to see interest in Linked Data metadata and KUTGW!

Cheers,
Michael

anandcv - April 4, 2010

Thanks Michael for your valuable opinions
re: 1. VoID editor Link – I have modified the link .
2. About VoID store and seperate files – I will consider that for sure.
3. about low level syntax etc – Also I intend to give people a deeper insight in to the mechanisms. It sure can be automated duely.
The whole Linkeddata/semantic web/open data debate seems to be very confusing. So I started working on things rather than reading standard and articles,in order to get a clearer picture. So I am starting with the top most cataloging and then step-by-step in to its applications. Data conversion itself is a huge job to be done in order to realize the vision.
Glad that it is inviting expert interactions.

2. woddiscovery - April 4, 2010

Thanks! Hm … the link to the voiD editor still seems to be the old one (http://ld2sd.deri.org/ve2/) rather than the new location ((http://lab.linkeddata.deri.ie/ve2/)

I think your approach is very good (doing rather than reading), however, one also should not forget to review what is out there already in order to be both efficient and effective ;)

Good to see the discussion happening – looking forward to see more from you.

Cheers,
Michael

anandcv - April 5, 2010

Sorry Michael, I corrected it.
Thanks for the support and suggestions again.
I will personally inform you my latest works in Semantic Web/Linked data.
regards
Anand

3. Richard Cyganiak - April 9, 2010

Anand, this is a great idea and I’d love to see the outcome, even if it doesn’t cover all of the datasets in the cloud! The resulting RDF could be easily combined with other voiD data collections to create a unified catalog of datasets.

anandcv - April 10, 2010

Thanks for sharing your thoughts Richard.
The LoD graph visualization itself gives us enough information on how the data are connected to each other. So I thought to start working on a semantic graph of the same as the top most entry point to linked data.I am still working on it and some of the datasets are vaguely documented, that makes it difficult to catalog. I will definitely ping you once I finish with the work
Regards
Anand


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: