开发者

How to optimize retrieval of most occurring values (hundreds of millions of rows)

I'm trying to retrieve some most occurring values from a SQLite table containing a few hundreds of millions of rows.

The query so far may look like this:

开发者_如何学CSELECT value, COUNT(value) AS count FROM table GROUP BY value ORDER BY count DESC LIMIT 10

There is a index on the value field.

However, with the ORDER BY clause, the query takes so much time I've never seen the end of it.

What could be done to drastically improve such queries on such big amount of data?

I tried to add a HAVING clause (e.g: HAVING count > 100000) to lower the number of rows to be sorted, without success.

Note that I don't care much on the time required to do the insertion (it still need to be reasonable, but priority is given to the selection), so I'm opened for solutions suggesting computation at insertion time ...

Thanks in advance,


1) create a new table where you'll store one row per unique "value" and the "count", put a descending index on the count column
2) add a trigger to the original table, where you maintain this new table (inset and update) as necessary to increment/decrement the count.
3) run your query off this new table, which will run fast because of the descending count index


this query forces you to look at every row in the table. that is what is taking time.

I almost never recommend this, but in this case, you could maintain the count in a denormalized fashion in an external table.

place the value and count into another table during insert, update, and delete via triggers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜