jump to navigation

practical semantic web – creating a catalog of Linked data April 4, 2010

Posted by Anand Mallaya in computers, internet, technology, tutorial, web.
Tags: , , , , , , ,
6 comments

Today I am going to work on creating a semantic web document. I am going to make a catalog of linked data datasets listed under linkeddata.org. Here is the list I am going to publish in RDF.

  1. choose the correct vocabulary – there are different generic vocabularies like Dublin Core, FOAF etc and specialized vocabulary like Dcat, Void etc. for creating catalogs. Dcat is designed for government data catalogs, so I choose VoID vocabulary, which is designed for single dataset provider. It uses generic vocabularies lke FOAF and DC as well.
  2. Select suitable tools – Tools to edit the RDF document. There are plenty of them ,in this case and RDF editor, like the Rhodonite tool for RDF editing and browsing. But I couldn’t understand it well because of poor documentation and help. So I chose an online VoID editor from DERI Galway. Though the result is in Turtle format, there are tools to convert turtle document to RDF/XML format. Like this one online : RDF Validator/converter rdfabout.com
  3. Creating the semantic grpah – first I am going to choose a dataset and add it to my catalog. To start with, I chose CrunchBase entry listed in linkeddata.org.  Go to VoID editor and add the following details in it
  4. Dataset URI:  http://cb.semsol.org/

    Dataset Homepage URIhttp://cb.semsol.org/

    Dataset Name:  Crunchbase

    Dataset Description:  RDFized Crunchbase entries

    Example Resource:  http://cb.semsol.org/company/yahoo

    Dataset Topic : business, database

    Vocabulary URIhttp://www.w3.org/1999/02/22-rdf-syntax-ns# , http://www.w3.org/2003/01/geo/wgs84_pos#,  http://cb.semsol.org/ns#

    Publisherhttp://semsol.com

    SPARQL endpoint : http://cb.semsol.org/sparql

    Now the dataset entry for cruchbase dataset is ready in VoID vocabulary, in the right side textarea. It is in Turtle notation.

    @prefix dcterms: <http://purl.org/dc/terms/&gt; .
    @prefix void: <http://rdfs.org/ns/void#&gt; .
    @prefix : <#> .
    ## your dataset
    <http://cb.semsol.org/&gt; rdf:type void:Dataset ;
    foaf:homepage <http://cb.semsol.org/&gt; ;
    dcterms:title “Crunchbase” ;
    dcterms:description “RDFized Crunchbase entries” ;
    dcterms:publisher <http://semsol.com&gt; ;
    void:sparqlEndpoint <http://cb.semsol.org/sparql&gt; ;
    void:vocabulary <http://cb.semsol.org/ns#&gt; ;
    void:exampleResource <http://cb.semsol.org/company/yahoo&gt; ;

    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt; .@prefix foaf: <http://xmlns.com/foaf/0.1/&gt; .@prefix dcterms: <http://purl.org/dc/terms/&gt; .@prefix void: <http://rdfs.org/ns/void#&gt; .@prefix : <#> .
    ## your dataset<http://cb.semsol.org/&gt; rdf:type void:Dataset ; foaf:homepage <http://cb.semsol.org/&gt; ; dcterms:title “Crunchbase” ; dcterms:description “RDFized Crunchbase entries” ; dcterms:publisher <http://semsol.com&gt; ; void:sparqlEndpoint <http://cb.semsol.org/sparql&gt; ; void:vocabulary <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; ; void:vocabulary <http://www.w3.org/2003/01/geo/wgs84_pos#&gt; ; void:vocabulary <http://cb.semsol.org/ns#&gt; ; void:exampleResource <http://cb.semsol.org/company/yahoo&gt; ; dcterms:subject <http://dbpedia.org/resource/Database&gt; ; dcterms:subject <http://dbpedia.org/resource/Business&gt; .

  5. Convert in to RDF/XML file(serialize) – copy the dataset details in turtle notation and go to RDF/XML converter tool. And paste the turtle notated content there and select input format as N-Triples/Turtle, and click validate. The result is given below.
  6. <?xml version=”1.0″?>
    <rdf:RDF xmlns:foaf=”http://xmlns.com/foaf/0.1/&#8221; xmlns:void=”http://rdfs.org/ns/void#&#8221;
    xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;
    xmlns:dcterms=”http://purl.org/dc/terms/”&gt;
    <void:Dataset rdf:about=”http://cb.semsol.org/”&gt;
    <foaf:homepage rdf:resource=”http://cb.semsol.org/&#8221; />
    <dcterms:title>Crunchbase</dcterms:title>
    <dcterms:description>RDFized Crunchbase entries</dcterms:description>
    <dcterms:publisher rdf:resource=”http://semsol.com&#8221; />
    <void:sparqlEndpoint rdf:resource=”http://cb.semsol.org/sparql&#8221; />
    <void:vocabulary rdf:resource=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221; />
    <void:vocabulary rdf:resource=”http://www.w3.org/2003/01/geo/wgs84_pos#&#8221; />
    <void:vocabulary rdf:resource=”http://cb.semsol.org/ns#&#8221; />
    <void:exampleResource rdf:resource=”http://cb.semsol.org/company/yahoo&#8221; />
    <dcterms:subject rdf:resource=”http://dbpedia.org/resource/Database&#8221; />
    <dcterms:subject rdf:resource=”http://dbpedia.org/resource/Business&#8221; />
    </void:Dataset>
    </rdf:RDF>

  7. Now repeat the above process and create entries for all the datasets listed in the Linkeddata.org.
  8. Combine the RDF entries in to a single file, no need to copy all the tags but starting from <void:Dataset .. >only
    something like
    <?xml version=”1.0″?>
    <rdf:RDF xmlns:foaf=”http://xmlns.com/foaf/0.1/&#8221; xmlns:void=”http://rdfs.org/ns/void#&#8221;
    xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;
    xmlns:dcterms=”http://purl.org/dc/terms/”&gt;
    <void:Dataset rdf:about=”http://cb.semsol.org/”&gt;


    </void:Dataset>
    <void:Dataset rdf:about=”http://dbpedia.org/”&gt;
    ….
    ….
    <void:Dataset>
    <void:Dataset rdf:about=”http://www.geonames.org/”&gt;


    <void:Dataset>
    </rdf:RDF>
  9. And the catalog is ready in RDF/XML with all the datasets added. Save it as a file with extension .rdf like Linked_data_catalog.rdf.  Now the machine can understand (if programmed wisely) to some extend what data are available there to convert in to meaningful information and then to  knowledge.
    You can find the catalog here: Linkeddata datasets catalog(note: not yet ready, comeback later )