Returning an OFFSET subquery result in sqlite3
I'm selecting a random row from a table in SQLite by using a subquery to determine a random OFFSET value:
SELECT id, prev_node, next_node FROM edges WHERE prev_node = ? LIMIT 1
OFFSET abs(random())%(SELECT count(*) FROM edges WHERE prev_node = ?);
This is functionally correct for my task, but it requires two hits to an index:
0|0|TABLE edges WITH INDEX edges_all_prev
0|0|TABLE edges WITH INDEX edges_all_prev
The query is for a random walk that is very likely to visit the same node more than once, so as the number of edges grows it would be helpful to cache the result of the SELECT count(*)
subquery.
Can I select the value of that subquery along with my other returned values?
Looking at the VDBE dump for the query, that value is just out of reach. It's in register 8 (moved there in step 21) while the result row is being created from registers 16-18 (step 42):
0|Trace|0|0|0||00|
1|Integer|1|1|0||00|
2|Function|0|0|5|random(0)|00|
3|Function|0|5|4|abs(1)|01|
4|If|7|23|0||00|
5|Integer|1|7|0||00|
6|Null|0|8|0||00|
7|Integer|1|9|0||00|
8|Null|0|10|0||00|
9|Variable|2|11|1||00|
10|Goto|0|47|0||00|
11|OpenRead|2|15|0|keyinfo(4,BINARY,BINARY)|00|
12|IsNull|11|18|0||00|
13|Affinity|11|1|0|d|00|
14|SeekGe|2|18|11|1|00|
15|IdxGE|2|18|11|1|01|
16|AggStep|0|0|10|count(0)|00|
17|Next|2|15|0||00|
18|Close|2|0|0||00|
19|AggFinal|10|0|0|count(0)|00|
20|SCopy|10|13|0||00|
21|Move|13|8|1||00|
22|IfZero|9|23|-1||00|
23|Remainder|8|4|2||00|
24|MustBeInt|2|0|0||00|
25|IfPos|2|27|0||00|
26|Integer|0|2|0||00|
33|Affinity|14|1|0|d|00|
34|SeekGe|3|45|14|1|00|
35|IdxGE|3|45|14|1|01|
36|AddImm|2|-1|0||00|
37|IfNeg|2|39|0||00|
38|Goto|0|44|0||00|
39|IdxRowid|3|16|0||00|
40|Column|3|0|17||00|
41|Column|3|1|18||00|
42|ResultRow|16|3|0||00|
43|IfZero|1|45|-1||00|
44|Next|3|35|0||00|
45|Close|3|0|0||00|
46|Halt|0|0|0||00|
47|Trans开发者_运维问答action|0|0|0||00|
48|VerifyCookie|0|27|0||00|
49|TableLock|0|9|0|edges|00|
50|Goto|0|11|0||00|
I could create a function that saves the count after it's calculated, but is there a straightforward SQL syntax for requesting the result of that subquery?
I wrote the function to save the counts I mentioned at the end of the original post, so here's one possible answer for removing the duplicate index search. I would still like to know if this is doable with straight SQL.
I created a passthrough user function to capture the count from the subquery as the offset is calculated.
So instead of the original query:
SELECT id, prev_node, next_node FROM edges WHERE prev_node = ? LIMIT 1
OFFSET abs(random())%(
SELECT count(*) FROM edges WHERE prev_node = ?);
I have something more like this:
SELECT id, prev_node, next_node FROM edges WHERE next_node = ? LIMIT 1
OFFSET abs(random())%(
cache(?, (SELECT count(*) FROM edges WHERE prev_node = ?));
The first argument to cache() is a unique identifier for that count. I could just use the value of prev_node, but due to the application I need to be able to cache the counts for forward and backward walks separately. So I'm using "$direction:$prev_node_id" as the key.
The cache function looks like this (using Python):
_cache = {}
def _cache_count(self, key, count):
self._cache[key] = count
return count
conn.create_function("cache", 2, self._cache_count)
And then in the random walk function, I can cons up the hash key and check whether the count is already known. If it is, I use a variant of the main query that doesn't include the subquery:
uncached = "SELECT id, next_node, prev_node " \
"FROM edges WHERE prev_node = :last LIMIT 1 " \
"OFFSET abs(random())%cache(:key, " \
" (SELECT count(*) FROM edges WHERE prev_node = :last))"
cached = "SELECT id, next_node, prev_node, has_space, count " \
"FROM edges WHERE prev_node = :last LIMIT 1 " \
"OFFSET abs(random())%:count"
key = "%s:%s" % (direction, last_node)
if key in cache:
count = cache[key]
query = cached
args = dict(last=last_node, count=count)
else:
query = uncached
args = dict(last=last_node, key=key)
row = c.execute(query, args).fetchone()
The cached queries run about twice as fast as the uncached on average (5.7us vs. 10.9us).
精彩评论