On IBM there is a nice tutorial "Introduction to Jena" (https://www.ibm.com/developerworks/xml/library/j-jena/), which was pointed out to me by my supervisor Andra Waagmeester. This tutorial is meant to teach people how to use Jena to create a RDF model and use SPARQL to query some information out of it. This is a very clear and helpful tutorial, but since I am not an expert in JAVA, but quite good in python, I will try to translate the code into python. To do this ill be using the python RDFlib library.
The reason I do this is to give a clear overview of what my code does later on, since most people in this department do not have the python expertise that I do. By doing this, they will have a template to understand my scripts better. Also doing this tutorial, will give me more inside and experience to optimize my scripts.
To be able to replicate the tutorial in python, I needed to find out what kind of libraries there were in python that allows me to build RDF models. As I have mentioned in earlier blogs, in python it is the RDFlib library that will allow me to do this.
This blog I will show the scripts I have created to do the exact same thing as the Jena tutorial on IBM but in python.
For listing 1 of the IBM tutorial I have put up a python script that is equivalent to it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def listing1(): | |
# import library | |
import rdflib | |
from rdflib.graph import ConjunctiveGraph as Graph | |
from rdflib.namespace import Namespace | |
from rdflib.term import Literal | |
from rdflib.term import URIRef | |
# create namespaces | |
familyUri = Namespace("http://family/") | |
relationshipUri = Namespace("http://purl.org/vocab/relationship/") | |
# Create subject and identify them with family URI | |
adam = familyUri["adam"] | |
beth = familyUri["beth"] | |
chuck = familyUri["chuck"] | |
dotty = familyUri["dotty"] | |
edward = familyUri["edward"] | |
fran = familyUri["fran"] | |
greg = familyUri["greg"] | |
harriet = familyUri["harriet"] | |
# create a property for the different types of relationships (predicates) | |
childOf = relationshipUri["childOf"] | |
parentOf = relationshipUri["parentOf"] | |
siblingOf = relationshipUri["siblingOf"] | |
spouseOf = relationshipUri["spouseOf"] | |
# create empty graph (Model) | |
graph = rdflib.Graph() | |
# add statement to graph | |
graph.add((adam,siblingOf,beth)) | |
graph.add((adam,spouseOf,dotty)) | |
graph.add((adam,parentOf,edward)) | |
graph.add((adam,parentOf,fran)) | |
graph.add((dotty,parentOf,edward)) | |
graph.add((dotty,parentOf,fran)) | |
graph.add((beth,siblingOf,adam)) | |
graph.add((beth,spouseOf,chuck)) | |
graph.add((edward,childOf,adam)) | |
graph.add((edward,childOf,dotty)) | |
graph.add((edward,siblingOf,fran)) | |
graph.add((fran,childOf,adam)) | |
graph.add((fran,childOf,dotty)) | |
graph.add((fran,siblingOf,edward)) | |
graph.add((fran,parentOf,harriet)) | |
graph.add((fran,spouseOf,greg)) | |
graph.add((greg,parentOf,harriet)) | |
graph.add((harriet,childOf,fran)) | |
graph.add((harriet,childOf,greg)) | |
# Commit | |
graph.commit() | |
# write data | |
graph.serialize("IBM_example.ttl", format="turtle") | |
# Close graph | |
graph.close() | |
listing1() |
The script for listing 2 and in python can be seen here:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def listing2(): | |
# import library | |
import rdflib | |
from rdflib.graph import ConjunctiveGraph as Graph | |
from rdflib.namespace import Namespace | |
from rdflib.term import Literal | |
from rdflib.term import URIRef | |
from rdflib import RDF | |
# create namespaces | |
familyUri = Namespace("http://family/") | |
relationshipUri = Namespace("http://purl.org/vocab/relationship/") | |
# create empty graph (Model) | |
graph = rdflib.Graph() | |
# parse an existing file | |
graph.parse('IBM_example.ttl', format='n3') | |
print "List everyone in the model who has a child (as subject):" | |
prop_parents = list(graph.subject_objects(relationshipUri["parentOf"])) | |
for item in prop_parents: | |
print item[0] | |
print "List everyone in the model who has a child (as object):" | |
prop_parents = list(graph.subject_objects(relationshipUri["childOf"])) | |
for item in prop_parents: | |
print item[1] | |
print "List everyone in the model who has a sibling (as subject and object):" | |
prop_parents = list(graph.subject_objects(relationshipUri["siblingOf"])) | |
for item in prop_parents: | |
print item[0], "\t", item[1] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def listing3(): | |
# import library | |
import rdflib | |
from rdflib.graph import ConjunctiveGraph as Graph | |
from rdflib.namespace import Namespace | |
from rdflib.term import Literal | |
from rdflib.term import URIRef | |
from rdflib import RDF | |
import pprint | |
import os | |
import sys | |
# create namespaces | |
familyUri = Namespace("http://family/") | |
relationshipUri = Namespace("http://purl.org/vocab/relationship/") | |
# create empty graph (Model) | |
graph = rdflib.Graph() | |
# parse an existing file | |
graph.parse('IBM_example.ttl', format='n3') | |
print "Find the exact statement adam is a spouse of dotty:" | |
compl_trip = list(graph.triples((familyUri["adam"], relationshipUri["spouseOf"], familyUri["dotty"]))) | |
for item in compl_trip: | |
print item[0], "\t", item[1], "\t", item[2] | |
print "Find all statements with adam as the subject and dotty as the object:" | |
subj_obj = list(graph.triples((familyUri["adam"], None, familyUri["dotty"]))) | |
for item in subj_obj: | |
print item[0], "\t", item[1], "\t", item[2] | |
print "Find any statements made about adam:" | |
subj_obj = list(graph.triples((None, None, familyUri["dotty"]))) | |
for item in subj_obj: | |
print item[0], "\t", item[1], "\t", item[2] | |
print "Find any statement with the siblingOf property:" | |
subj_obj = list(graph.triples((None, relationshipUri["siblingOf"], None))) | |
for item in subj_obj: | |
print item[0], "\t", item[1], "\t", item[2] | |
listing3() |
The subject_objects property of graph you can find any subject and object that are connected to a certain predicate. The triples property of gragh allow me to find exact matches of triples. In this case you can also put "None" in if you dont know the URI or Literal and it will find the triples based on the 1 or 2 values you have entered. Basically I can use the "triples" property as I would use the "subject_objects" property.
It is also possible to store this graph in a database. In this case I followed the example and used a MySQL database. Fist of all I needed to install MySQL for python using:
easy_install mysql-python
When the MySQL server and mysql-python library is installed I am now able to use the RDFlib library to connect to MySQL database or create one. In Listing 4, they use a different database that comes with the Jena library, since I am not using the Jena library, ill be using listing 1 as example.
With the following code we can open the MySQL database and put the precious created graph in it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def listing4(): | |
import rdflib | |
from rdflib.graph import ConjunctiveGraph | |
from rdflib import plugin | |
from rdflib.store import Store, VALID_STORE | |
from rdflib import URIRef | |
import os | |
# Define type of storage and login information | |
rdflib.plugin.register('MySQL', Store,'MySQL', 'MySQL') | |
default_graph_uri = "http://example.com/rdfstore" | |
configString = "host=localhost,user=root,password=qwerty,db=test" | |
# Get the mysql plugin. You may have to install the python mysql libraries | |
store = plugin.get('MySQL', Store)('rdfstore') | |
# Check whether database exits else create a new one | |
rt = store.open(configString, create=False) | |
if rt == 0: | |
# There is no underlying MySQL infrastructure, create it | |
store.open(configString,create=True) | |
else: assert rt == VALID_STORE | |
# create a graph, with the opened store bound at constructor arg | |
graph = ConjunctiveGraph(store, identifier = URIRef(default_graph_uri)) | |
# Check if file exist else create new graph | |
if os.path.isfile('IBM_example.ttl'): | |
graph.parse('IBM_example.ttl', format='n3') | |
# Insert graph in database | |
graph.commit() | |
else: pass | |
listing4() |
In the case that there is no predefined property I can use, there is also a possibility to use SPARQL. This is also including in the RDFlib package. The python script for it can be seen here:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def listing5(): | |
import rdflib | |
from rdflib.graph import ConjunctiveGraph | |
from rdflib import plugin | |
from rdflib.store import Store, VALID_STORE | |
from rdflib import URIRef | |
import os | |
# Create configuration data | |
rdflib.plugin.register('MySQL', Store,'MySQL', 'MySQL') | |
default_graph_uri = "http://example.com/rdfstore" | |
configString = "host=localhost,user=root,password=qwerty,db=test" | |
# Connect to the database | |
store = plugin.get('MySQL', Store)('rdfstore') | |
rt = store.open(configString, create=False) | |
graph = ConjunctiveGraph(store, identifier = URIRef(default_graph_uri)) | |
listing6(graph) | |
def listing6(graph): | |
# import library | |
import rdflib | |
from rdflib.graph import ConjunctiveGraph as Graph | |
from rdflib.namespace import Namespace | |
from rdflib.term import Literal | |
from rdflib.term import URIRef | |
from rdflib import plugin | |
from rdflib import RDF | |
import pprint | |
import os | |
import sys | |
# Bind SPARQL processor to the rdflib.graph.Graph.query() | |
plugin.register( | |
'sparql', rdflib.query.Processor, | |
'rdfextras.sparql.processor', 'Processor') | |
plugin.register( | |
'sparql', rdflib.query.Result, | |
'rdfextras.sparql.query', 'SPARQLQueryResult') | |
# Give the childeren who have an uncle and aunt | |
print "Give the childeren who have an uncle and aunt" | |
g = graph.query(""" | |
SELECT ?child | |
WHERE { | |
?child <http://purl.org/vocab/relationship/childOf> ?parents . | |
?parents <http://purl.org/vocab/relationship/siblingOf> ?siblings | |
} | |
""") | |
for items in list(set(g)): | |
print items | |
# Give the childeren who have grandparents | |
print "Give the childeren who have an uncle and aunt" | |
g = graph.query(""" | |
SELECT ?child | |
WHERE { | |
?child <http://purl.org/vocab/relationship/childOf> ?parents . | |
?parents <http://purl.org/vocab/relationship/childOf> ?grandparents | |
} | |
""") | |
for items in list(set(g)): | |
print items | |
listing5() |
SPARQL tutorial
SPARQL is the query language used to search RDF stores. Since my project requires me to use SPARQL endpoint, it is necessary to learn how it works. Now SPARQL is basically a modified version of SQL. So a lot of syntax's are still the same. This is very fortunate for me since I have a background in SQL, which makes it easier to understand SPARQL.
Just like SQL you need a "SELECT" to choose which columns I want to see. Unlike SQL, you do not need to use columns that exist, since RDF stores dont work like SQL stores, they dont make use of tables. If for example I write "SELECT ?child" it means that I am creating a column called "child". The "?" is how SPARQL defines its variables. Now that I have defined what kind of information I want to see, I can try to find it, either by looking for the "subjects", "predicates" or "objects". This we can do by using the "WHERE":
SELECT ?child
WHERE { ?parent <http://example.org/has_child> ?child }
This small query will return the children of every parent. Now as can be seen ?parent is defined in the WHERE statement but is not visualized at all. This is no problem since, ?parent is just a variable to catch the subjects, it could have been any "?" followed by a word. In this query the only 2 variables of importance is the predicate (has_child) and the ?child to catch the information who are the children. Of course if I wanted the parents too I could have added it to the SELECT statement.
As for the predicate these can be found in vocabularies used by the store. In this case it was an example. But existing RDF stores use predicates from, certain vocabularies to define there data. This Subject was previously explained by me in a precious Blog.
Also it is important to note that URIs have to be between "<>". And if I don't want to define a variable and since I'm not going to use it I can use "[]" as in that I am not interested in this field. example:
SELECT ?child
WHERE { [] <http://example.org/has_child> ?child }
Would give me the same result as the first query.
Furthermore just like SQL you can use statements like: group by, order by, DESC, count, etc. Also a very important syntax is the FILTER. This syntax allowed me to search within the context of a variable for certain words or characters to filter on. Example finding the Fins word for Saxophone:
SELECT ?fin
WHERE
{
<http://dbpedia.org/resource/Saxophone> rdfs:label ?fin .
FILTER ( lang(?fin) = 'fi' )
}
Geen opmerkingen:
Een reactie posten