Searching an RDF Graph for Partial Matches

2023-02-11 03:19 问答作者：

How would I search an RDF database to find the segments of the graph that overlap the most with a sample graph?

For example, say my database stores the following arbitrary graphs:

entity1 [
    type "TOP" ;
    attr1 [
        attr11 [
            attr111 "apple" ;
        ] ;
        attr12 [
            attr121 "orange" ;
        ] ;
        attr13 [
            attr131 "banana" ;
        ] ;
    ] ;
    attr2 [
        attr21 [
            attr211 "falcon" ;
        ] ;
        attr22 [
            attr221 "pigeon" ;
        ] ;
        attr23 [
            attr231 "parrot" ;
        ] ;
    ] ;
] .
entity2 [
    type "TOP" ;
    attr11 [
        attr111 "apple" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
] .
entity3 [
    type "TOP" ;
    attr2 [
        attr_middle [
            attr21 [
                attr211 "falcon" ;
            ] ;
            attr22 [
                attr221 "pigeon" ;
            ] ;
            attr23 [
                attr231 "parrot" ;
            ] ;
        ] ;
    ] ;
] .

And now say I have the sample graph:

sample [
    type "TOP" ;
    attr11 [
        attr111 "apple开发者_开发问答" ;
    ] ;
    attr12 [
        attr121 "orange" ;
    ] ;
    attr13 [
        attr131 "banana" ;
    ] ;
    attr21 [
        attr211 "falcon" ;
    ] ;
    attr22 [
        attr221 "pigeon" ;
    ] ;
    attr23 [
        attr231 "parrot" ;
    ] ;
] .

Clearly, nothing in the database matches the sample perfectly, but each entity matches it partially, even if the comman triples exist at different levels in each graph.

How would I find the closest matches to the sample? In this case, I'd expect a query to return, sorted best match first, [entity1, entity3, entity2].

I'm still a little new to RDF, so forgive me if my terminology is off. As I currently understand RDF databases, what I'm trying to do isn't typically how they're used. If I want to find the entities "containing" the relation attr111 = "apple" using a SPARQL query, I'd generally have to assume that relation is at a fixed location relative to each entity, wheras searching to triples at arbitrary locations relative to a "root" is much more difficult. Is that correct?

No it is not that difficult but your SPARQL queries may become rather long to achieve this. There is no need to assume a fixed root since you can use variables for the root as shown in my examples. In the case where the root is fixed substitute the variable for a value.

Note - If the resulting query has no variables in it then it would be better phrased as an ASK query. If you use a SELECT query and there are no variables you have no way to distinguish between a query results that matches and one that doesn't. Whereas an ASK query returns either true or false depending on whether the WHERE clause matches

If your SPARQL processor supports SPARQL 1.1 then you can use property paths .e.g

SELECT * WHERE { ?s ex:predicate / ex:predicate / ex:predicate "value" }

If you only have SPARQL 1.0 then you have to state the match explicitly like so:

SELECT * WHERE
{
  ?s ex:predicate _:b1 .
  _:b1 ex:predicate _:b2 .
  _:b2 ex:predicate "value" .
}

Note that semantically these two forms are actually equivalent - the SPARQL 1.1 form is a nice syntactic shortcut for the SPARQL 1.0 form.

Obviously the larger the part of your Graph you want to match grows the larger your SPARQL query will get.

继续阅读：rdf semantic-web sparql

Searching an RDF Graph for Partial Matches

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？