Query Performance, Multiple "OR's" vs. Separate Statements
Let's say I have an Oracle database table of 40MM 9-digit ZIP codes which includes the 4-digit ZIP code for each. I have a list of 800 5-digit ZIP codes and I need to find all the associated 9-digit ZIP codes. There are 40K 5-digit ZIP codes altogether. Assume we have indexed the 5-digit ZIP code field with a B*Tree (conventional) index. With database performance and scalability in mind, is the best method to:
Query for all the records in a single statement? Something like this run once:
SELECT * FROM ZIPCODE WHERE ZIP5 IN (:1, :2,... :800)
Query once for each of the 5-digit ZIP codes? Something like this run 800 times:
SELECT * FROM ZIPCODE WHERE ZIP5 = :1
Some batch size in between the two?
What do you think, and why? Some [optional] follow-up thoughts if you're in a thinking mood:
- What tests can we run to validate your thinking?
- Does the best method change as the volumes change (e.g., if we have 100 9-digit ZIP codes mapped to 10 5-digit ZIP codes, looking for 3 of those 10)? 开发者_如何学Python
- Is the best thing for the database different than the best thing for the app servers which are absorbing the query results?
Let the database server decide how to handle it. Even if it internally does a query 800 times, it will still be faster. It has to parse the query only once, and has to send results only once.
Thus, use SELECT * FROM ZIPCODE WHERE ZIP5 IN (:1, :2,... :800)
Do you have the option to create an external table? I.e.,
CREATE TABLE zip5 (zip5 varchar2(5))
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY <some oracle DIRECTORY object>
LOCATION (<yourDirectory>:'zip5 filename.txt'
)
Place your zip5 text file in the OS directory specified by your Oracle directory object, then execute:
SELECT * FROM zipcode JOIN zip5 ON (zipcode.zip5 = zip5.zip5);
This is a more generic solution than rebuilding your query each time your ZIP5 list changes.
精彩评论