开发者

Character-matching queries in SQL

I'm attempting to optimize a T-SQL stored procedure I have. It's for pulling records based on a VIN (a 17-character alphanumeric string); usually people only know a few of the digits—e.g. the first digit could be '1', '2', or 'J'; the second is 'H' but the third could be 'M' or 'G'; and so on.

This leads to a pretty convoluted query whose WHERE clause is something like

WHERE SUBSTRING(VIN,1,1) IN ('J','1','2')
AND SU开发者_运维知识库BSTRING(VIN,2,1) IN ('H')
AND SUBSTRING(VIN,3,1) IN ('M','G')
AND SUBSTRING(VIN,4,1) IN ('E')
AND ... -- and so on for however many digits we need to search on

The table I'm querying on is huge (millions of records) so the queries I'm running that have this kind of WHERE clause can take hours to run if there are more than a couple digits being searched on, even if I'm only requesting the top 3000 records. I feel like there has to be a way to get this substring character matching to run faster. Hours are completely unacceptable; I'd like to have these kinds of queries run in just a few minutes.

I don't have any editing privileges on the database, sadly, so I can't add indexes or anything like that; all I can do is change my stored procedure (although I can try to beg the DBAs to modify the table).


You can use

WHERE VIN LIKE '[J12]H[MG]E%'

At least that should hopefully lead to 3 index seeks on the ranges JH%, 1H%, and 2H% rather than a full scan.

Edit Although testing locally I found that it does not do multiple index seeks as I had hoped it converts the above to a single seek on the larger range VIN >= '1' and VIN < 'K' with a residual predicate to evaluate the LIKE

I'm not sure whether it will do this for larger tables or not but otherwise it may well be worth trying to encourage this plan with

WHERE (VIN LIKE 'JH%' OR  VIN LIKE '1H%' OR  VIN LIKE '2H%') 
        AND VIN LIKE '[J12]H[MG]E%'


You could use the LIKE keyword

SELECT
  *
FROM Table
WHERE VIN LIKE '[J12]H[MG]E%'

This would even allow you to work with instance where they know the second character is not 'A' by using [^A] in the statement, such as:

WHERE VIN LIKE '[J12][^A][MG]E%'

Reference http://msdn.microsoft.com/en-us/library/ms179859.aspx


I like the LIKE answers, but here's another alternative (especially if your input isn't always the same).

I would do this as a series of queries on ever-smaller temp tables (Yes, I'm in love with temp tables- sue me.)

So I would do something like

SELECT [Fields]
INTO #tempResultsFirstTwoDigits
FROM VIN
WHERE [Clause]

Then keep moving down the chain digit by digit until you've searched each of the provided characters. So you might do this:

if len(@input) > 2
SELECT [Fields]
INTO #tempResultsThreeDigits
FROM VIN
WHERE Substring(VIN, 3, 1) = Substring(@input, 3, 1)
//NOTE: That where clause might be sped up by initializing a variable at 
//      the beginning of the SP for each character you got.

Else Select * From #tempResultsFirstTwoDigits
GOTO Stop //Where "Stop" just defines the end of the SP to skip any further checks

Again, LIKE might be a better answer for you, but I would try both approaches and benchmark both of them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜