Why IsNull is twice slow as coalesce (same query)?
We met a strange situation on SQL Server 2008 (SP1) - 10.0.2531.0 (X64) - Win2008 SP2 (X64).
Here is a one heavy query:
select t1.id, t2.id
from t1, t2
where
t1.id = t2.ext_id
and isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
.... and about 10 mor开发者_如何学编程e comparisons with Isnull
UPD: All columns in comparison (except IDs) are varchar(~30...200)
These query on rather big Dev server run ~5 hours - this is slow, but what we can do?
And while we investigated possible ways of optimisation - we found, that changing "isnull" to "coalesce" in query above gives double performance gain - and query now run for ~2 hours
UPD: When we remove all ISNULL checks and use just t1.vchCol1 = t2.vchCol1 the query finishes after 40mins.
Question is: Is this known behavior and we should avoid using IsNull everywhere?
I wonder if you'd see an improvement by splitting the cases out explicitly:
...
AND ((t1.vchCol1 = t2.vchCol1) OR (t1.vchCol1 IS NULL AND t2.vchCol1 IS NULL))
AND ((t1.vchCol2 = t2.vchCol2) OR (t1.vchCol2 IS NULL AND t2.vchCol2 IS NULL))
...
Most of the articles you'll find on this subject seem to contradict this. ISNULL is (marginally) faster than COALESCE.
Differences between ISNULL and COALESCE
COALESCEbasically translates toCASEexpression andISNULLis a built-in implemented in the database engine.
...
This will make a performance difference and queries withCOALESCEoften fare worse here.
ISNULL vs. COALESCE
I ran these tests several times on a few different servers, and
ISNULLappears to pretty consistently out-performCOALESCEby an average of 10 or 12 percent. But that's the difference between 6 seconds and 5.3 seconds (the approximate average runtimes per test on my servers), over the course of a million exections. Hardly worth the functionality and standards compliance sacrifice, at least in the scenarios I use these functions for.
COALESCE vs ISNULL vs IS NULL OR
the best performer is
IS NULL ORcase, while the difference between all 3 of them is minor.
You may want to consider adding a computed column to each table that holds a checksum value. Then, create an index on the ID column and the checksum value, and finally use the checksum value in the join. Something like this:
Alter Table T1 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Alter Table T2 Add CheckSumId As CHECKSUM(vchCol1, vchCol2, vchCol3)
Create NonClustered index idx_T1_Checksum On T1(id, CheckSumId)
Create NonClustered index idx_T2_Checksum On T2(ext_id, CheckSumId)
Then your query would become...
select t1.id, t2.id
from t1 Inner Join t2
On t1.id = t2.ext_id
And T1.CheckSumId = T2.CheckSumId
where isnull(t1.vchCol1, 'Null') = isnull(t2.vchCol1, 'Null')
and isnull(t1.vchCol2, 'Null') = isnull(t2.vchCol2, 'Null')
This will, of course, use extra index space, but it's simply 2 integers which should be very efficient. There will also be a performance penalty for each insert, update and delete because another index needs to be maintained. However, I suspect this will have a large impact on performance.
I realize this is a year later, but...
For this kind of column-by-column comparison, you might consider using EXCEPT. Also, EXCEPT treats NULL like another value instead of "It could be anything!", as I like to call it.
"When you compare rows for determining distinct values, two NULL values are considered equal." --from http://msdn.microsoft.com/en-us/library/ms188055.aspx
加载中,请稍侯......
精彩评论