How to optimize an SQL query with many thousands of WHERE clauses
I have a series of queries against a very mega large database, and I have hundreds-of-thousands of ORs in WHERE clauses. What is the best and easiest way to optimize such SQL queries? I found some articles about creating temporary tables and using joins, but I am unsure. I'm new to serious SQL, and have been cutting and pasting results from one into the next.
SELECT doc_id, language, author, title FROM doc_text WHERE language='fr' OR language='es'
SELECT doc_id, ref_开发者_如何学运维id FROM doc_ref WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR ...
SELECT ref_id, location_id FROM ref_master WHERE ref_id=098765 OR ref_id=987654 OR ref_id=876543 OR OR OR ...
SELECT location_id, location_display_name FROM location
SELECT doc_id, index_code, FROM doc_index WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR x100,000
These unoptimized query can take over 24 hours each. Cheers.
I think I just answered my own question... NESTED TABLES!
SELECT doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title, doc_ref.ref_id, ref_master.location_id, location.location_display_name, doc_index.doc_id, doc_index.display_heading
FROM DOC_TEXT, DOC_REF, REF_MASTER, LOCATION, DOC_INDEX
WHERE
doc_text.language='fr' OR doc_text.language='es'
AND
doc_text.doc_id=doc_ref.doc_id
AND
doc_ref.doc_id=ref_master.ref_id
AND
ref_master.location_id=location.location_id
AND
doc_text.doc_id=doc_index.doc_id
The easiest way to get that done is this:
- Make indexes on the columns that are being filtered on (
language
,ref_id
,doc_id
, etc), at least double check their existence. Make them clustered if they are the primary index of the table. - Create helper tables that contain the conditions (add/delete conditions through INSERT/DELETE statements), index them too.
- instead of 1000 "OR" components, make an INNER JOIN:
So...
SELECT doc_id, language, author, title
FROM doc_text
WHERE language='fr' OR language='es'
becomes
INSERT language_search (language) VALUES ('fr')
INSERT language_search (language) VALUES ('es')
/* and 50 more */
SELECT dt.doc_id, dt.language, dt.author, dt.title
FROM doc_text dt
INNER JOIN language_search ls ON dt.language = ls.language
Instead of having a lot of conditions on the same field, you can use the in
keyword:
SELECT doc_id, ref_id FROM doc_ref WHERE doc_id in (1234567, 1234570, 1234572, 1234596, ...)
This will make the queries shorter, but it's not certain that the performance will differ much. You should make sure that you have indexes on the relevant fields, that usually makes a huge difference for the performance.
Edit
However, it seems that the reason that you have a lot of values to compare is that you are using the result from one query to create the next. This should of course be solved with a join instead of a dynamic query:
select
doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title,
doc_ref.ref_id, ref_master.location_id, location.location_display_name,
doc_index.doc_id, doc_index.display_heading
from DOC_TEXT
inner join DOC_REF on doc_text.doc_id = doc_ref.doc_id
inner join REF_MASTER on doc_ref.doc_id = ref_master.ref_id
inner join LOCATION on ref_master.location_id = location.location_id
inner join DOC_INDEX on doc_text.doc_id = doc_index.doc_id
where
doc_text.language in ('fr', 'es')
I think your real problem is that you are not JOINing tables.
this is a guess, but I'll bet that you run a query and then get all the IDs in your application and then run another query WHERE all the rows match from the previous query. You would greatly improve performance by writing a query with a join:
SELECT
*
FROM YourTableA a
INNER JOIN YourTableB b ON a.ID=b.ID
WHERE a. .....
then process the single result set in your application.
精彩评论