开发者

How to optimize an SQL query with many thousands of WHERE clauses

I have a series of queries against a very mega large database, and I have hundreds-of-thousands of ORs in WHERE clauses. What is the best and easiest way to optimize such SQL queries? I found some articles about creating temporary tables and using joins, but I am unsure. I'm new to serious SQL, and have been cutting and pasting results from one into the next.

SELECT doc_id, language, author, title FROM doc_text WHERE language='fr' OR language='es'
SELECT doc_id, ref_开发者_如何学运维id FROM doc_ref WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR ...
SELECT ref_id, location_id FROM ref_master WHERE ref_id=098765 OR ref_id=987654 OR ref_id=876543 OR OR OR ...
SELECT location_id, location_display_name FROM location
SELECT doc_id, index_code, FROM doc_index WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR x100,000

These unoptimized query can take over 24 hours each. Cheers.


I think I just answered my own question... NESTED TABLES!

SELECT doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title, doc_ref.ref_id, ref_master.location_id, location.location_display_name, doc_index.doc_id, doc_index.display_heading
FROM DOC_TEXT, DOC_REF, REF_MASTER, LOCATION, DOC_INDEX
WHERE
    doc_text.language='fr' OR doc_text.language='es'
AND
    doc_text.doc_id=doc_ref.doc_id
AND
    doc_ref.doc_id=ref_master.ref_id
AND
    ref_master.location_id=location.location_id
AND
    doc_text.doc_id=doc_index.doc_id


The easiest way to get that done is this:

  • Make indexes on the columns that are being filtered on (language, ref_id, doc_id, etc), at least double check their existence. Make them clustered if they are the primary index of the table.
  • Create helper tables that contain the conditions (add/delete conditions through INSERT/DELETE statements), index them too.
  • instead of 1000 "OR" components, make an INNER JOIN:

So...

SELECT doc_id, language, author, title 
  FROM doc_text
 WHERE language='fr' OR language='es'

becomes

INSERT language_search (language) VALUES ('fr')
INSERT language_search (language) VALUES ('es')
/* and 50 more */

SELECT dt.doc_id, dt.language, dt.author, dt.title 
  FROM doc_text dt
       INNER JOIN language_search ls ON dt.language = ls.language


Instead of having a lot of conditions on the same field, you can use the in keyword:

SELECT doc_id, ref_id FROM doc_ref WHERE doc_id in (1234567, 1234570, 1234572, 1234596, ...)

This will make the queries shorter, but it's not certain that the performance will differ much. You should make sure that you have indexes on the relevant fields, that usually makes a huge difference for the performance.

Edit

However, it seems that the reason that you have a lot of values to compare is that you are using the result from one query to create the next. This should of course be solved with a join instead of a dynamic query:

select
  doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title,
  doc_ref.ref_id, ref_master.location_id, location.location_display_name,
  doc_index.doc_id, doc_index.display_heading
from DOC_TEXT
inner join DOC_REF on doc_text.doc_id = doc_ref.doc_id
inner join REF_MASTER on doc_ref.doc_id = ref_master.ref_id
inner join LOCATION on ref_master.location_id = location.location_id
inner join DOC_INDEX on doc_text.doc_id = doc_index.doc_id
where
  doc_text.language in ('fr', 'es')


I think your real problem is that you are not JOINing tables.

this is a guess, but I'll bet that you run a query and then get all the IDs in your application and then run another query WHERE all the rows match from the previous query. You would greatly improve performance by writing a query with a join:

SELECT
    *
    FROM YourTableA            a
        INNER JOIN YourTableB  b ON a.ID=b.ID
    WHERE a. .....

then process the single result set in your application.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜