开发者

IN vs OR in the SQL WHERE clause

When dealing with big databases, which performs bet开发者_JS百科ter: IN or OR in the SQL WHERE clause?

Is there any difference about the way they are executed?


I assume you want to know the performance difference between the following:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

The best way to know is to profile both on your database with your specific data to see which is faster.

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.


The best way to find out is looking at the Execution Plan.


I tried it with Oracle, and it was exactly the same.

CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );

SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );

Even though the query uses IN, the Execution Plan says that it uses OR:

--------------------------------------------------------------------------------------    
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    
--------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT  |                  |     8 |  1416 |   163   (2)| 00:00:02 |    
|*  1 |  TABLE ACCESS FULL| PERFORMANCE_TEST |     8 |  1416 |   163   (2)| 00:00:02 |    
--------------------------------------------------------------------------------------    

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR                
              "OBJECT_NAME"='DBMS_STANDARD')                                              


The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.

Here is a list of what you can use with OR but that are not compatible with IN: greater, greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition, consider that the conditions may not always compare the same value.

For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.

[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.


I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)


OR makes sense (from readability point of view), when there are less values to be compared. IN is useful esp. when you have a dynamic source, with which you want values to be compared.

Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.


I'll add info for PostgreSQL version 11.8 (released 2020-05-14).

IN may be significantly faster. E.g. table with ~23M rows.

Query with OR:

explain analyse select sum(mnozstvi_rozdil)
from product_erecept
where okres_nazev = 'Brno-město' or okres_nazev = 'Pardubice';

-- execution plan
Finalize Aggregate  (cost=725977.36..725977.37 rows=1 width=32) (actual time=4536.796..4540.748 rows=1 loops=1)
  ->  Gather  (cost=725977.14..725977.35 rows=2 width=32) (actual time=4535.010..4540.732 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=724977.14..724977.15 rows=1 width=32) (actual time=4519.338..4519.339 rows=1 loops=3)
              ->  Parallel Bitmap Heap Scan on product_erecept  (cost=15589.71..724264.41 rows=285089 width=4) (actual time=135.832..4410.525 rows=230706 loops=3)
                    Recheck Cond: (((okres_nazev)::text = 'Brno-město'::text) OR ((okres_nazev)::text = 'Pardubice'::text))
                    Rows Removed by Index Recheck: 3857398
                    Heap Blocks: exact=11840 lossy=142202
                    ->  BitmapOr  (cost=15589.71..15589.71 rows=689131 width=0) (actual time=140.985..140.986 rows=0 loops=1)
                          ->  Bitmap Index Scan on product_erecept_x_okres_nazev  (cost=0.00..8797.61 rows=397606 width=0) (actual time=99.371..99.371 rows=397949 loops=1)
                                Index Cond: ((okres_nazev)::text = 'Brno-město'::text)
                          ->  Bitmap Index Scan on product_erecept_x_okres_nazev  (cost=0.00..6450.00 rows=291525 width=0) (actual time=41.612..41.612 rows=294170 loops=1)
                                Index Cond: ((okres_nazev)::text = 'Pardubice'::text)
Planning Time: 0.162 ms
Execution Time: 4540.829 ms

Query with IN:

explain analyse select sum(mnozstvi_rozdil)
from product_erecept
where okres_nazev in ('Brno-město', 'Pardubice');

-- execution plan
Aggregate  (cost=593199.90..593199.91 rows=1 width=32) (actual time=855.706..855.707 rows=1 loops=1)
  ->  Index Scan using product_erecept_x_okres_nazev on product_erecept  (cost=0.56..591477.07 rows=689131 width=4) (actual time=1.326..645.597 rows=692119 loops=1)
        Index Cond: ((okres_nazev)::text = ANY ('{Brno-město,Pardubice}'::text[]))
Planning Time: 0.136 ms
Execution Time: 855.743 ms


I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.

IN vs OR in the SQL WHERE clause

Now use IN:

IN vs OR in the SQL WHERE clause

23.18ms

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜