开发者

Searching an RDF Graph for Partial Matches

How would I search an RDF database to find the segments of the graph that overlap the most with a sample graph?

For example, say my database stores the following arbitrary graphs:

entity1 [
    type "TOP" ;
    attr1 [
        attr11 [
            attr111 "apple" ;
        ] ;
        attr12 [
            attr121 "orange" ;
        ] ;
        attr13 [
            attr131 "banana" ;
        ] ;
    ] ;
    attr2 [
        attr21 [
            attr211 "falcon" ;
        ] ;
        attr22 [
            attr221 "pigeon" ;
        ] ;
        attr23 [
            attr231 "parrot" ;
        ] ;
    ] ;
] .
entity2 [
    type "TOP" ;
    attr11 [
        attr111 "apple" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
] .
entity3 [
    type "TOP" ;
    attr2 [
        attr_middle [
            attr21 [
                attr211 "falcon" ;
            ] ;
            attr22 [
                attr221 "pigeon" ;
            ] ;
            attr23 [
                attr231 "parrot" ;
            ] ;
        ] ;
    ] ;
] .

And now say I have the sample graph:

sample [
    type "TOP" ;
    attr11 [
        attr111 "apple开发者_开发问答" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
    attr13 [
        attr131 "banana" ;
    ] ;
    attr21 [
        attr211 "falcon" ;
    ] ;
    attr22 [
        attr221 "pigeon" ;
    ] ;
    attr23 [
        attr231 "parrot" ;
    ] ;
] .

Clearly, nothing in the database matches the sample perfectly, but each entity matches it partially, even if the comman triples exist at different levels in each graph.

How would I find the closest matches to the sample? In this case, I'd expect a query to return, sorted best match first, [entity1, entity3, entity2].

I'm still a little new to RDF, so forgive me if my terminology is off. As I currently understand RDF databases, what I'm trying to do isn't typically how they're used. If I want to find the entities "containing" the relation attr111 = "apple" using a SPARQL query, I'd generally have to assume that relation is at a fixed location relative to each entity, wheras searching to triples at arbitrary locations relative to a "root" is much more difficult. Is that correct?


No it is not that difficult but your SPARQL queries may become rather long to achieve this. There is no need to assume a fixed root since you can use variables for the root as shown in my examples. In the case where the root is fixed substitute the variable for a value.

Note - If the resulting query has no variables in it then it would be better phrased as an ASK query. If you use a SELECT query and there are no variables you have no way to distinguish between a query results that matches and one that doesn't. Whereas an ASK query returns either true or false depending on whether the WHERE clause matches

If your SPARQL processor supports SPARQL 1.1 then you can use property paths .e.g

SELECT * WHERE { ?s ex:predicate / ex:predicate / ex:predicate "value" }

If you only have SPARQL 1.0 then you have to state the match explicitly like so:

SELECT * WHERE
{
  ?s ex:predicate _:b1 .
  _:b1 ex:predicate _:b2 .
  _:b2 ex:predicate "value" .
}

Note that semantically these two forms are actually equivalent - the SPARQL 1.1 form is a nice syntactic shortcut for the SPARQL 1.0 form.

Obviously the larger the part of your Graph you want to match grows the larger your SPARQL query will get.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜