开发者

performing Complex Joining on RDF

I want to execute the following query:

SELECT ?name1 ?name2 WHERE {
    ?article1 rdf:type bench:Article .
    ?article2 rdf:type bench:Article .
    ?article1 dc:creator ?author1 .
    ?author1 foaf:name ?name1 .
    ?article2 dc:creator ?author2 .
    ?author2 foaf:name ?name2 .
    ?article1 swrc:journal ?journal .
    ?article2 swrc:journal ?journal .
}

This is a complex query so to execute this 开发者_如何学运维query on RDF data I want to follow this approach

  1. I will find all the common join variables like: ?article1 , ?article2, ?author1,?author2,?journal .
  2. I will perform join which is partial join so the output will be produced based on common join vraiables (Total no of output files will be 5)
  3. Now I want to perform SELECT operation {SELECT ?name1 ?name2 } on these 5 output files..
  4. Done

Now My confusion is will it produce perfect output as like normal join or not????..


My guess is that you need something like ...

SELECT ?article ?name WHERE {
?article rdf:type bench:Article .
?article dc:creator ?author .
?author foaf:name ?name .
FILTER ( ?article = <ARTICLE_URI_1> || ?article = <ARTICLE_URI_2> || ...
... || ?article = <ARTICLE_URI_5>)
}

Creating a filter with article URIs to match will give you back five rows rather than one row with five names that I think is what your query would return. Also is important to retrieve the article URI so that you are able to track back articles and names.

Also, your query is not using the SPARQL "joins" as expected, you have three separate blocks of isolated patterns that can end up provoking a combinatorial explosion depending on the structure of your data.

Edit: join analysis of query in the question

The join of that query will produce most likely inconsistent results. But the most optimised way to perform would be to start with the most restrictive patterns. So a posible approach could be:

  1. Find ?article1 and ?article2 by applying ?article1 rdf:type bench:Article . and ?article2 rdf:type bench:Article .

  2. Remove all values from article1 and ?article2 that are not in the same ?journal. This is due to the patterns ?article1 swrc:journal ?journal . and ?article2 swrc:journal ?journal .

  3. Substitute the values of ?article1 and ?article2 on ?article1 dc:creator ?author1 . and ?article2 dc:creator ?author2 respectively to get ?author1 and ?author2.

  4. Do a equivalent step to get ?name1 ?name2.

  5. Do the cartesian product of selected variables ?name1 ?name2 because they are not joined.

Bottom line, the answer to your question is: Yes the output is produced based on joining variables. Which most of the times is also executed by substituting values on subsequent patterns. The optimization normally are made based on most restrictive patterns and substitute as soon as possible.


I'm not quite sure, what you're trying to do. Are you implementing a SPARQL query evaluator and the results are incorrect?

In any case, yes, this query can be executed using joins. I don't know what you mean by partial join. All the joins here are normal equijoins. One valid join order would be:

  1. join ?article1 rdf:type bench:Article with ?article1 dc:creator ?author1 (on ?article1)
  2. join result with ?author1 foaf:name ?name1 (on ?author1)
  3. join result with ?article1 swrc:journal ?journal (on ?article1)
  4. join result with ?article2 swrc:journal ?journal (on ?journal)
  5. join result with ?article2 rdf:type bench:Article (on ?article2)
  6. join result with ?article2 dc:creator ?author2 (on ?article2)
  7. join result with ?author2 foaf:name ?name2 (on ?author2)

This may not be the best join order and, of course, it also depends on the actual join algorithms used.

You could also see what other triple stores do with the query. For example in Sesame, after preparing a query, you can examine the query plan by calling SailQuery.getParsedQuery().getTupleExpr().

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜