开发者

Duplicate triple in RDF, authoritative view?

if a triple store contains twice the same triple, what is (if any exist) the authoritative position about this redundancy ?

Additionally, should a triplestore be allowed to store twice the same triple within the same context ?

I ask this because in rdflib apparently you can store the same triple twice (or more). This is the reader

import rdflib
from rdflib import store

s = rdflib.plugin.get('MySQL', store.Store)('rdfsto开发者_开发百科re')

config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
    s.open(config_string,create=True)

graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef("urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"))
rows = graph.query("SELECT ?id ?value { ?id <http://localhost#ha> ?value . }")
for r in rows:
    print r[0], r[1]

and this is the writer

import rdflib
from rdflib import store

s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')

config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
    s.open(config_string,create=True)

graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef("urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"))
graph.add( ( rdflib.URIRef("http://localhost/1000"), rdflib.URIRef("http://localhost#ha"), rdflib.Literal("18")) )
graph.commit()

This is what I obtain

sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
table kb_7b066eca61_relations Doesn't exist
table kb_7b066eca61_relations Doesn't exist
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
sbo@dhcp-045:~/tmp/gd $ python ./writer2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
http://localhost/1000 18
sbo@dhcp-045:~/tmp/gd $ python ./writer2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
http://localhost/1000 18
http://localhost/1000 18

To me it appears as a bug. A modified version shows me that both triples belong to the same context, and there are indeed two triples as well

len : 2
http://localhost/1000 18
http://localhost/1000 18
(rdflib.URIRef('http://localhost/1000'), rdflib.URIRef('http://localhost#ha'), rdflib.Literal(u'18'), <Graph identifier=urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52 (<class 'rdflib.Graph.Graph'>)>)
(rdflib.URIRef('http://localhost/1000'), rdflib.URIRef('http://localhost#ha'), rdflib.Literal(u'18'), <Graph identifier=urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52 (<class 'rdflib.Graph.Graph'>)>)


An RDF triple store is a set of triples, so the same triple cannot be present twice, by definition. However, most rdf stores are actually quad stores (sets of rdf graphs also known as datasets) and in that case, the triple may appear multiple times. That is sometimes called context, depending on the store (eg mine, Redland). Authority is really up to the user to define what meaning a particular graph name/context name has.


One should keep in mind that any particular triple may have different metadata than other - otherwise identical - triples. Metadata such as the original source of the triple, possible strength of connection information, etcetera. It may also be feasible to merely count the number of copies of a triple in order to judge the relative strength of a connection compared to other possible contradictory connections. So, as always, it all depends upon what you intend to do with your data.


RDF is a language for expressing factual claims, organized and grouped into graphs. If a graph contains "Alice is a Person" twice, that's just redundant. So within a graph, triples are normalised; there's no point in repeating them. However applications, stores and SPARQL-queriable systems will often collect factual claims from different sources. The SPARQL language has the 'GRAPH' keyword for when you want to take a multi-graph perspective and look for the same triple in different sources.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜