Is it faster to use a complicated boolean to limit a ResultSet at the MySQL end or at the Java end?
Lets say I have a really big table filled with lots of data (say, enough not to fit comfortably in memory), and I want to analyze a subset of the rows.
Is it generally faster to do:
SELECT (column1, column2, ... , columnN) FROM table WHERE (some complicated boolean clause);
and then use the ResultSet, or is it faster to do:
SELECT (column1, column2, ... , columnN) FROM table;
and then iterate over the ResultSet, accepting different rows based on a java version of your boolean condition?
I think it comes down to whether the Java iterator/boolean ev开发者_开发问答aluator is faster than the MySQL boolean evaluator.
It is almost certainly faster to send the condition to the database.
- You avoid transferring lots of rows whose data you don't need.
- The database might use something faster than a table scan. It may be able to use an index which allows it to more quickly find the interesting rows without having to check the conditions on every row.
I think it comes down to whether the Java iterator/boolean evaluator is faster than the MySQL boolean evaluator.
No. The deciding factor will almost certainly be the amount of data that has to be transported over the network (and assorted overhead). Reducing the result set size on the DB server is the right thing to do 99% of the time. This is especially true in complex queries where it could lead to smaller joins.
As a general rule, the database wins. That will almost certainly be the case for you. If you want to be sure though, profile it. I have run into cases in other languages where the overhead of transferring a lot of data was offset by the fact that some of the processing could be done outside of the DB much faster than in it. If the boolean condition you are evaluating is extremely complex to express in relational terms, you could see a benefit in evaluating it in Java, but it is extremely unlikely.
The database was designed to optimize your task. Your language wasn't. And the database probably has better caching resources to prevent disk operations than does your workstation with everything else it's doing.
This is a little like asking whether you should download the data into Excel first, with a datawad bigger than Excel can hold in memory.
精彩评论