Does ISNULL or OR have better performance?
I have the SQL query:
SELECT ISNULL(t.column1, t.column2) as [result]
FROM t
I need to filter out data by [result] column. What is the best approach regarding performance from the two listed below:
WHERE ISNULL(t.column1, t.colu开发者_如何学Cmn2) = @filterValue
or:
WHERE t.column1 = @filterValue OR t.column2 = @filterValue
UPDATE: Sorry, I have forgotten to mention that the column2 is always null if the column1 is filled.
Measure, don't guess! This is something you should be doing yourself, with production-like data. We don't know the make-up of your data and that makes a big difference.
Having said that, I wouldn't do it either way. I'd create another column, column3
to store column1
if non-NULL and column2
if column1
is NULL.
Then I'd have an insert/update trigger to populate that column correctly, index it and use the screaming-banshee-speed:
select t.column3 as [result] from t
The vast majority of databases are read more often than written and it's better if this calculation is done as few times as possible (i.e., when the data changes, not every time you select it). If you want your databases to be scalable, don't use per-row functions.
It's perfectly valid to sacrifice disk space for speed and the triggers ensure that the data doesn't become inconsistent.
If adding another column and triggers is out of the question, I'd go for the or
solution since it can often be split into two parallel queries by the smarter DBMS engines.
An alternative, which MarkB gave but since deleted his answer so I'll have to go hunting for another good answer of his to upvote :-), is to use UNION ALL
. If your DBMS isn't quite smart enough to recognise OR
as a chance for parallelism, it may be smart enough to recognise UNION ALL
in that context, something like:
select column1 as c from t where column1 is not NULL
union all
select column2 as c from t where column1 is NULL
But again, it depends on both your database and your data. A smart DBA would put the whole thing in a stored procedure so they could swap in a new method seamlessly should the data change its properties.
On an MSSQL-Table (MSSQL 2000) with 13.000.000 entries and indexes on Col1 and Col2 i get the following results:
select top 1000000 * from Table1 with(nolock) where isnull(Col1,Col2) > '0'
-- Compile-Time: 4ms
-- CPU-Time: 18265ms
-- Elapsed-Time: 24882ms = ~25s
select top 1000000 * from Table1 with(nolock) where Col1 > '0' or (Col1 is null and Col2 > '0')
-- Compile-Time: 9ms
-- CPU-Time: 7781ms
-- Elapsed-Time: 25734 = ~26s
The measured values are subject to strong fluctuations base on the workload of the server.
The first statment need lesser time to compile but takes more cpu-time for excecution (culstered index scan).
Its important to know that many storage-engines have an optimizer who reorganize the statment for better results und executiontimes. Ultimately, both statements will rebuild to mostly the same statement by the optimizer.
I think, your replacement expression does not mean the same. Assume filterValue is 2
, then ISNULL(1,2)=2
is false, but 1=2 or 2=2
is true. The expression you need looks more like:
(c1=filter) or ((c1 is null) and (c2 = filter));
There is a chance that a server can answer this from the index on c1
. First part of the soultion is an index scan over c1=filter
. The second part is a scan over c1=null
and then a linear search for c2=filter
. I'd even say that a clustered index (c1,c2)
could work here.
OTOH, you should rather measure before make assumptions like this, speculations doesn't work usually in SQL unless you have intimate knowledge on the implementation. For example, I'm pretty sure that the query planners already knows that ISNULL(X,Y)
can be decomposed into a boolean statement with its implications for searching, but I would not rely on that but rather measure and then decide what to do.
I have measured the performance of both queries on SQL Sever 2008.
And have got the following results:
Both approaches had almost the same Estimated Subtree Cost metric. But the OR approach had more accurate value of the Estimated Number of Rows metric.
So the query optimizer will build more appropriate execution plan for the OR than for ISNULL approach.
精彩评论