开发者

Why do SUBSTRING or LEFT make the query much slower?

I have a blacklist of people that should never be contacted. When I want to see if a person is in this list, I do the following:

-- Query 1
SELECT * 
FROM bldb.dbo.blacklist l
WHERE l.matchcode
    = dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert')

The query runs very fast, since there is an index on matchcode column, and fn_matchcode is deterministic.

Think of matchcode as a compressed form of address and name, which helps me not to be affected from typos in street names etc. It consists of 22 chars: 13 for the address, 9 for the name. When I want to see if any person in 1 Sesame Street, 12345 is in blacklist, I do the following:

-- Query 2
SELECT * 
FROM bldb.dbo.blacklist l
WHERE LEFT(l.matchcode,13)
    = LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13)

This runs extremely long...

On the contrary, this runs much faster:

-- Query 3
SELECT * 
FROM bldb.dbo.blacklist l
WHERE LEFT(l.matchcode,13)
    = (SELECT LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13))

That means, the right hand side of the where condition gets calculated for every row! But why? The UDF is deterministic. Is it LEFT(), which is not deterministic?

EDIT:

The answers so far claimed that it is because the index does not get 开发者_开发百科used. However, it is still not clear to me why the following happens.

When I write the query like this:

-- Query 4
SELECT * 
FROM bldb.dbo.blacklist
WHERE matchcode LIKE LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13) + '%'

it still takes several minutes to finish. Please note that the fn_matchcode just does some string manipulation and returns instantly.

When I hardcode the result of fn_matchcode into the query:

-- Query 5
SELECT * 
FROM bldb.dbo.blacklist
WHERE matchcode LIKE '12345SSMSTRT1%'

it takes a couple of milliseconds! How would you explain that?


Following the update in your question can you look at the two execution plans for your queries #4 and #5 and see whether it does a clustered index scan for one and a non clustered index seek for the other? I wonder if it is because it knows the statistics for the literal at compile time but not for the function call. As it has no idea that only a handful of records will be returned it errs on the side of caution to avoid doing a whole load of bookmark lookups.

If that is the case then does the following help?

SELECT * 
FROM bldb.dbo.blacklist WITH (FORCESEEK)
WHERE matchcode LIKE 
  LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13) + '%'


I'd use

SELECT * 
FROM bldb.dbo.blacklist l
WHERE l.matchcode LIKE 
  LEFT(dbo.fn_matchcode('12345','Sesame Street','1','Eddie','Bert'),13) + '%'

To allow the index on matchcode to be used. That doesn't answer your question though but is too long for a comment.


This is an issue of sargability - when you use a function like left in the where clause, the query can't use the indexes which slows it down.


That means, the right hand side of the where condition gets calculated for every row!

No, you're thinking of correlated subqueries - which this is not.

The last example is fast because the optimizer sees it as a join (due to the SELECT), vs the WHERE clause in the previous examples.

While using a function on a column will render an index on that column useless, cutting down the number of characters on a VARCHAR column (say first 10 of VARCHAR(150)) could be faster due to less to compare. INTs are 4 bytes regardless, but that's not the case for string based data types...


For a query to run fast, there must be an index for the thing being queried. If you are going to be querying based upon a value computed from one or more columns, you need to have an indexed column containing the result of that computation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜