partial results from a long-running SELECT query?

2022-12-18 07:52 问答作者：

We are issuing some long running queries on a mysql database. (The context is offline data analysis, not an application.) How we will proceed in research terms depends on the results we obtain along the way. It would be useful for us to be able to view (partial) results as they are generated by a SELECT statement -- before the query completes.

Is this possible? Or are we stuck with waiting until the query completes (which given the size of the data开发者_运维百科set can take a couple of hours) to view results which were generated in the very first seconds it ran?

Thank you for any help.

I'm going to hazard a guess that you have ORDER BY or GROUP BY as part of your query.

Most database engines I've used all starts streaming data back to the client as soon as it can, even if it hasn't fetched them all internally yet. However, once you throw GROUP BY or ORDER BY into the mix, the engine doesn't know what the first row will look like until it has produced the entire data set server-side, which is why you're left waiting for a long time.

The simplest thing to try is to use unbuffered queries. Then mysql will start delivering data as soon as it can, rather than when it has everything ready (and buffered). Depending on your query, this may not help.

To really speed things up, you need to break up your query. Not just using LIMIT, that's not going to save you much time depending on your query. For example, if you have an ORDER BY, pretty much the whole result set will have to be calculated first. You would only save the time it would take to deliver less data across the network.

Split up your queries by doing a filter. If you have a field that is indexed that you can do range searches on (i.e. auto increment), then break up your query into multiple queries using that field. For example:

SELECT * FROM db WHERE field1 BETWEEN 1 AND 10000;
SELECT * FROM db WHERE field1 BETWEEN 10000 AND 20000;
...

Then you can combine the results afterward. Many times multiple queries like this will complete faster than the equivalent single query. But if you do have an ORDER BY or GROUP BY, this may not be possible. But you could still try breaking it up into smaller queries, join them with a UNION and select on the UNION with your grouping and order by. Believe or not, this can still be much quicker than the equivalent single query. You just have to get the individual queries processing a small enough data set to make them quick.

SELECT field1, SUM(field3) field3, SUM(item_count) item_count FROM 
(
SELECT field1, SUM(field3) field3, COUNT(item) item_count FROM db WHERE field1 BETWEEN 1 AND 10000 GROUP BY field1
UNION
SELECT field1, SUM(field3) field3, COUNT(item) item_count FROM db WHERE field1 BETWEEN 10000 AND 20000 GROUP BY field1
UNION
...
) AS sub_queries GROUP BY field1

Divide and conquer. Using this technique I've sometimes reduced query times from an hour down to a minute or two.

Sorry for adding this as a new answer, but the "add comment" button still doesn't show :

@Lasse,

The question sounded to me like the OP was interested in "intermediately knowing about the current value of, say, a running sum that is being computed".

That cannot be done, period.

If the OP's question was rather in the direction of what you indicate, which is all about getting "early subsets of the full result set", then my suggestion would of course be to resort to techniques of quota queries. You know, "OPTIMIZE FOR 20 ROWS" and that sort of stuff.

Returning intermediate results while the "full" query" is still in progress, is against the spirit of how SQL, and even the relational model, was originally conceived.

The RM, and even SQL, were conveived to return only full-and-final results once those are "fully-and-finally" computed.

If you want to get statistically reliable approximations of the final result that are based on a subset of the population, you HAVE TO RESORT to techniques of statistics and extrapolation.

继续阅读：database select sql

partial results from a long-running SELECT query?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？