开发者

How to lazily evaluate XQuery order by when the results are going to be a small subset anyway?

Imagine you have a large number of records inside an XQuery-based XML database:

<widgets>
   <widget id="1" name="Foo Widget" price="19.99" />
   <widget id="2" name="Bar Widget" price="29.99" />
   <widget id="3" name="Baz Widget" price="39.99" />
   <!-- etc. -->
</widget>

By a 'large number', I mean a million or more.

You wish to retrieve one item from the list randomly using XQuery:

let $widgets := for $widget in //widgets/widget
  order by util:random()
  return $widget

for $val in开发者_运维问答 subsequence($widgets, 1, 1)
  return $val

When the number of records grows, the evaluation takes an extravagant amount of time to run as it seems to load everything from the database and reorder it in memory. I think that might be O(n log 2n). Sigh-inducing slowness.

Is there a lazier, better way of doing this?

There's the "get a count of the number of items, then randomly pick a number from zero to count" method, which I'd rather avoid.

Ideally, the database could do it, if there were some kind of feature like:

let $widgets := for $widget in //widgets/widget
  order by util:random()
  limit 1
  return $widget

This would be FLOLWR, I guess. But it's not in the XQuery spec, even though it's a common enough thing one might do in SQL (or indeed SPARQL or a number of other query languages).

Is there any way of getting this? Adding a where clause would do it, but where clauses get evaluated before order clauses, which doesn't really help.

Any suggestions? (The application sending the XQueries is written in Java, and the XML database is eXist, if that helps with any slightly more curveball, off-the-wall ideas.)


The optimizer might do a better job if you don't use an intermediate variable, but that's a big might.

subsequence(
 for $widget in //widgets/widget
  order by util:random()
  return $widget
 ,1,1)

I suspect the "method you'd rather avoid" will perform better, but the proof is in the benchmarking.

//widgets/widget[util:random(count(//widgets/widget))]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜