开发者

partial results from a long-running SELECT query?

We are issuing some long running queries on a mysql database. (The context is offline data analysis, not an application.) How we will proceed in research terms depends on the results we obtain along the way. It would be useful for us to be able to view (partial) results as they are generated by a SELECT statement -- before the query completes.

Is this possible? Or are we stuck with waiting until the query completes (which given the size of the data开发者_运维百科set can take a couple of hours) to view results which were generated in the very first seconds it ran?

Thank you for any help.


I'm going to hazard a guess that you have ORDER BY or GROUP BY as part of your query.

Most database engines I've used all starts streaming data back to the client as soon as it can, even if it hasn't fetched them all internally yet. However, once you throw GROUP BY or ORDER BY into the mix, the engine doesn't know what the first row will look like until it has produced the entire data set server-side, which is why you're left waiting for a long time.


The simplest thing to try is to use unbuffered queries. Then mysql will start delivering data as soon as it can, rather than when it has everything ready (and buffered). Depending on your query, this may not help.

To really speed things up, you need to break up your query. Not just using LIMIT, that's not going to save you much time depending on your query. For example, if you have an ORDER BY, pretty much the whole result set will have to be calculated first. You would only save the time it would take to deliver less data across the network.

Split up your queries by doing a filter. If you have a field that is indexed that you can do range searches on (i.e. auto increment), then break up your query into multiple queries using that field. For example:

SELECT * FROM db WHERE field1 BETWEEN 1 AND 10000;
SELECT * FROM db WHERE field1 BETWEEN 10000 AND 20000;
...

Then you can combine the results afterward. Many times multiple queries like this will complete faster than the equivalent single query. But if you do have an ORDER BY or GROUP BY, this may not be possible. But you could still try breaking it up into smaller queries, join them with a UNION and select on the UNION with your grouping and order by. Believe or not, this can still be much quicker than the equivalent single query. You just have to get the individual queries processing a small enough data set to make them quick.

SELECT field1, SUM(field3) field3, SUM(item_count) item_count FROM 
(
SELECT field1, SUM(field3) field3, COUNT(item) item_count FROM db WHERE field1 BETWEEN 1 AND 10000 GROUP BY field1
UNION
SELECT field1, SUM(field3) field3, COUNT(item) item_count FROM db WHERE field1 BETWEEN 10000 AND 20000 GROUP BY field1
UNION
...
) AS sub_queries GROUP BY field1

Divide and conquer. Using this technique I've sometimes reduced query times from an hour down to a minute or two.


Sorry for adding this as a new answer, but the "add comment" button still doesn't show :

@Lasse,

The question sounded to me like the OP was interested in "intermediately knowing about the current value of, say, a running sum that is being computed".

That cannot be done, period.

If the OP's question was rather in the direction of what you indicate, which is all about getting "early subsets of the full result set", then my suggestion would of course be to resort to techniques of quota queries. You know, "OPTIMIZE FOR 20 ROWS" and that sort of stuff.


Returning intermediate results while the "full" query" is still in progress, is against the spirit of how SQL, and even the relational model, was originally conceived.

The RM, and even SQL, were conveived to return only full-and-final results once those are "fully-and-finally" computed.

If you want to get statistically reliable approximations of the final result that are based on a subset of the population, you HAVE TO RESORT to techniques of statistics and extrapolation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜