DBpedia query returns some musicals more than once despite filters
I'm trying to use a SPARQL query against DBpedia to retrieve a list of musicals and some associated properties. However, despite using the appropriate filters (as far as I can tell), the results include many of the musicals more than once. Here is my query:
PREFIX rdfs: <http:/开发者_C百科/www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT ?label ?abstract ?book ?music ?lyrics
WHERE {
?play <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Broadway_musicals> ;
rdfs:label ?label ;
dbo:abstract ?abstract ;
dbpprop:book ?book ;
dbpprop:lyrics ?lyrics ;
dbpprop:music ?music .
FILTER (LANG(?label) = 'en')
FILTER (LANG(?abstract) = 'en')
FILTER (LANG(?book) = 'en')
FILTER (LANG(?lyrics) = 'en')
FILTER (LANG(?music) = 'en')
}
The resulting list has many duplicate entries. Pasting the query here: DBpedia SPARQL Explorer, you'll see that starting with 'Mama Mia!' there are a lot of duplicates in the list.
Any idea what I'm missing to get unique results with no duplicates? Thanks!
[Edited by glenn mcdonald to clarify that it's musicals which are "duplicated" here, not triples.]
SPARQL returns variable-bindings. Your "duplicates" are cartesian products of multiples in your projected properties. Mamma Mia has multiple music writers and multiple lyricists, so you get every possible combination of them that could produce a row in your table.
What a pain, huh? The "solution" is to use CONSTRUCT instead of SELECT, and deal with getting back a graph instead of a table. Maybe like this:
http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0A++++PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0A++++PREFIX+dbpprop%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0A++++CONSTRUCT+%7B%0D%0A++++++++%3Fplay+rdfs%3Alabel+%3Flabel+%3B%0D%0A++++++++++++dbo%3Aabstract+%3Fabstract+%3B%0D%0A++++++++++++dbpprop%3Abook+%3Fbook+%3B%0D%0A++++++++++++dbpprop%3Alyrics+%3Flyrics+%3B%0D%0A++++++++++++dbpprop%3Amusic+%3Fmusic+.%0D%0A++++%7D%0D%0A++++WHERE+%7B+%0D%0A++++++++%3Fplay+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fsubject%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ABroadway_musicals%3E+%3B%0D%0A++++++++++++rdfs%3Alabel+%3Flabel+%3B%0D%0A++++++++++++dbo%3Aabstract+%3Fabstract+%3B%0D%0A++++++++++++dbpprop%3Abook+%3Fbook+%3B%0D%0A++++++++++++dbpprop%3Alyrics+%3Flyrics+%3B%0D%0A++++++++++++dbpprop%3Amusic+%3Fmusic+.%0D%0A++++++++FILTER+%28LANG%28%3Flabel%29+%3D+%27en%27%29++++%0D%0A++++++++FILTER+%28LANG%28%3Fabstract%29+%3D+%27en%27%29%0D%0A++++++++FILTER+%28LANG%28%3Fbook%29+%3D+%27en%27%29%0D%0A++++++++FILTER+%28LANG%28%3Flyrics%29+%3D+%27en%27%29%0D%0A++++++++FILTER+%28LANG%28%3Fmusic%29+%3D+%27en%27%29%0D%0A++++%7D
Are the duplicates exact duplicates? i.e. every value for every variable of each duplicate result is identical
If so then add the DISTINCT
keyword after SELECT
to force the SPARQL engine to discard duplicates solutions.
If not then Glenn is entirely correct that because there are multiple values given for the various properties so you will get multiple results. There are complex workarounds you can do with subqueries, GROUP BY
etc. but they would tend to lead to less efficient queries. Sometimes you just have to deal with the duplicates on the client side.
精彩评论