SQL LIKE Performance with only the wildcard (%) as a value
I am wondering what the performance of a query would be like using the LIKE keyword and the wildcard as the value compared to having no where clause at all.
Consider a where clause such as "WHERE a LIKE '%'". This will match all possible values of the column 'a'. How does this compare to not having the where clause at all.
The reason I ask this is that I have an application where there are开发者_如何学C some fields that the user may specify values to search on. In some cases the user would like all the possible results. I am currently using a single query like this:
SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?
The values of '%' and '%' can be supplied to match all possible values for a and or b. This is convenient since I can use a single named query in my application for this. I wonder what the performance considerations are for this. Does the query optimizer reduce LIKE '%' to simply match all? I realize that because I'm using a named query (prepared statement), that may also affect the answer. I realize the answer is likely database specific. So specifically how would this work in Oracle, MS SQL Server and Derby.
The alternate approach to this would be to use 3 separate queries based on the user inputting the wildcard.
A is wildcard query:
SELECT * FROM TableName WHERE b LIKE ?
B is wildcard query:
SELECT * FROM TableName WHERE a LIKE ?
A and B are wildcards:
SELECT * FROM TableName
No wildcards:
SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?
Obviously having a single query is the simplest and easiest to maintain. I would rather use just the one query if performance will still be good.
SQL Server will generally see
WHERE City LIKE 'A%'
and treat it as
WHERE City >= 'A' AND City < 'B'
...and happily use an index seek if appropriate. I say 'generally', because I've seen it fail to do this simplification in certain cases.
If someone's trying to do:
WHERE City LIKE '%ville'
...then an index seek will be essentially impossible.
But something as simple as:
WHERE City LIKE '%'
will be considered equivalent to:
WHERE City IS NOT NULL
You can use whatever query analysis the DBMS offers (e.g. EXPLAIN
for MySQL, SET SHOWPLAN_ALL ON
for MS SQL (or use one of the other methods), EXPLAIN PLAN FOR
for Oracle) to see how the query will be executed.
I was hoping there would be a textbook answer to this but it sounds like it will largely vary with different database types. Most of the responses indicated that I should run a test so that is exactly what I did.
My application primarily targets the Derby, MS SQL and Oracle databases. Since derby can be run embedded and is easy to set up, I tested the performance on that first. The results were surprising. I tested the worst case scenario against a fairly large table. I ran the test 1000 times and averaged the results.
Query 1:
SELECT * FROM TableName
Query 2 (With values of a="%" and b="%"):
SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?
Query 1 average time: 178ms
Query 2 average time: 181ms
So performance on derby is almost the same between the two queries.
Any DBMS worth its salt would strip out LIKE '%'
clauses before even trying to run the query. I'm fairly certain I've seen DB2/z do this in its execution plans.
The prepared statement shouldn't make a difference since it should be turned into real SQL before it gets to the execution engine.
But, as with all optimization questions, measure, don't guess! DBAs exist because they constantly tune the DBMS based on actual data (which changes over time). At a bare minimum, you should time (and get the execution plans) for all variations with suitable static data to see if there's a difference.
I know that queries like:
select c from t where ((1 = 1) or (c = ?))
are optimized to remove the entire where clause before execution (on DB2 anyway and, before you ask, the construct is useful where you need to remove the effect of the where clause but still maintain the parameter placeholder (using BIRT with Javascript to modify the queries for wildcards)).
Derby also offers tools for examining the actual query plan that was used, so you can run experiments using Derby and look at the query plan that Derby chose. You can run Derby with -Dderby.language.logQueryPlan=true, and Derby will write the query plan to derby.log, or you can use the RUNTIMESTATISTICS facility, as described here: http://db.apache.org/derby/docs/10.5/tuning/ctundepth853133.html
I'm not sure if Derby will strip out the A LIKE '%' ahead of time, but I also don't think that the presence of that clause will introduce much of a slowdown in the execution speed.
I'd be quite interested to see the actual query plan output that you get in your environment, with and without the A LIKE '%' clause in place.
Oracle 10gR2 does not appear to perform a special optimisation for this situation, but it does recognise that LIKE '%' excludes nulls.
create table like_test (col1)
as select cast(dbms_random.string('U',10) as varchar2(10))
from dual
connect by level <= 1000
/
insert into like_test values (null)
/
commit
/
exec dbms_stats.gather_table_stats(user,'like_test')
explain plan for
select count(*)
from like_test
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from like_test
where col1 like '%'
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from like_test
where col1 is not null
/
select plan_table_output from table(dbms_xplan.display)
/
... giving ...
Plan hash value: 3733279756
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| LIKE_TEST | 1001 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------
... and ...
Plan hash value: 3733279756
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| LIKE_TEST | 1000 | 10000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1" LIKE '%')
... and ...
Plan hash value: 3733279756
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| LIKE_TEST | 1000 | 10000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1" IS NOT NULL)
Note the cardinality (rows) on the TABLE ACCESS FULL line
Depending on how the LIKE predicate is structured and on the field you're testing on, you might need a full table scan. Semantically a '%' might imply a full table scan but Sql Server does all sorts of optimization internally on queries. So the question becomes: Does Sql Server optimize on a LIKE predicate formed with'%' and throws it out of the WHERE clause?
One aspect that I think is missing from the discussion is the fact that the OP wants to use a prepared statement. At the time the statement is prepared, the database/optimizer will not be able to work out the simplifications others have mentioned and so won't be able to optimize away the a like '%'
as the actual value will not be known at prepare time.
Therefore:
- when using prepared statements, have four different statements available (0, only a, only b, both) and use the appropriate one when needed
- see if you get better performance when you don't use a prepared statement when sticking to just one statement (although then it would be pretty easy to not include 'empty' conditions)
What if a column has a non-null blank value? Your query will probably match it.
If this is a query for a real world application then try using the free text indexing features of most modern sql databases. The performance issues will become insignificant.
A simple if statement of if (A B) search a b else (A) search a else B search b else tell user they didn't specify anything
is trivial to maintain and becomes much easier to understand instead of making assumptions about the LIKE operator. You are probably going to do that in the UI anyway when you display the results "Your search for A found x" or "Your search for A B found..."
I'm not sure of the value of using a prepared statement with the kind of parameters you're describing. The reason is that you might fool the query optimizer into preparing an execution plan that would be completely wrong depending on which of the parameters were '%'.
For instance, if the statement were prepared with an execution plan using the index on column A, but the parameter for column A turned out to be '%' you may experience poor performance.
a where clause with " like '%'" as the only predicate will behave exactly the same as no where clause at all.
精彩评论