开发者

SQL Query - 20mil records - Best practice to return information

I have a SQL database that has the following table:

Table开发者_Python百科: PhoneRecords
--------------
ID(identity Seed)
FirstName
LastName
PhoneNumber
ZipCode

Very simple straight forward table. This table has over 20million records. I am looking for the best way to do queries that pull out records based off area codes from the table. For instance here is an example query that I have done.

SELECT phonenumber, firstname
FROM [PhoneRecords]
WHERE (phone LIKE '2012042%') OR
      (phone LIKE '2012046%') OR
      (phone LIKE '2012047%') OR
      (phone LIKE '2012083%') OR
      (phone LIKE '2012088%') OR
      (phone LIKE '2012841%')

As you can see this is an ugly query, but it would get the job done (if I wasn't running into timeout issues)

Can anyone tell me the best way for speed/optimization to do the above query to display the results? Currently that query above takes around 2 hours to complete on a 9gb 1600mhz ram, i7 930 quadcore OC'd 4.01ghz. I obviously have the computer power required to do such a query, but still takes too long for queries.


You are probably missing an index on the phonenumber column.

CREATE INDEX IX_PHONERECORDS_PHONENUMBER_FIRSTNAME 
  ON dbo.PhoneRecords (PhoneNumber) INCLUDE (FirstName)

If that does not help, post the execution plan (CTRL+M).


First you need an index on the column phone. If you don't have one, add it.

If it still runs slowly you might try to use UNION ALL instead of OR as this can be easier for the optimizer to work with. This works because the way you have constructed your conditions guarantee that results will be distinct. So your query could be rewritten as:

SELECT phonenumber, firstname FROM [PhoneRecords] WHERE phone LIKE '2012042%'
UNION ALL
SELECT phonenumber, firstname FROM [PhoneRecords] WHERE phone LIKE '2012046%'
UNION ALL
SELECT phonenumber, firstname FROM [PhoneRecords] WHERE phone LIKE '2012047%'
UNION ALL
SELECT phonenumber, firstname FROM [PhoneRecords] WHERE phone LIKE '2012083%'
UNION ALL
SELECT phonenumber, firstname FROM [PhoneRecords] WHERE phone LIKE '2012088%'
UNION ALL
SELECT phonenumber, firstname FROM [PhoneRecords] WHERE phone LIKE '2012041%'

This query should be able to use the index to run efficiently.

You should look at the execution plan before running the actual query and make sure that there is no TABLE SCAN or INDEX SCAN.


Do you have any indexes? A first step is to put an index on the PhoneNumber column. If that isn't enough (I don't know the exact details of searching on part of strings in indexed columns) I would suggest adding another column named "AreaCode" which can be automatically computed from the PhoneNumber column. Then you can add an index on the AreaCode column.


The first and very obvious question is do you have indexes? You need to create indexes on at least phone number if you are going to be query against it. You should probably create an covering index which includes the fields you want and the fields that are in the where clause so the computer doesn't have to waste time fetching the row after it has found in the index to get to the information you want. Obviously the flip side to that is the bigger your index the slower your query.


You may split your phone number column : [Area Code], [Phone Number]

Then, if this query is the "most important" in your application for this table and the ratio returned rows/total rows is high, add a CLUSTERED index on [Area Code] otherwise, add a standard index.

You may also keep the Phone Number column as is and index it directly, it depends on your app.


First I would split the phone column to "Area code" and "Phone number".

Also, I would convert this numbers to int; Indexes will perform faster.

AreaCode = 2012042

should be much faster then

PhoneNumber LIKE '2012042%'


Even if you are doing a table scan (and it can happen even if you have an index, if the selectivity is low) your query should execute way faster than 2 hours. Your table is small enough to fit entirely in the sql server buffer pools, if there is no competition with others tables scanned by another queries and if the sqlserver max memory is large enough. So while you can do some tricks like adding indexes or splitting the phone number in area+phone you should investigate the sql server configuration and also your system configuration.


http://igoro.com/archive/precomputed-view-a-cool-and-useful-sql-pattern

create a materialized view which includes the first n numbers of the phone number as it's own column. Then you can query against the area code column and include the names. Precompute the area codes so it doesn't have to be done on every select. Don't use the or operator if you can help it. Use union to help the query plan use the index.

As it is, the query you're running will do 20,000,000 times x comparisons, where x is the number of area codes you're searching for every time you do the select. By querying an exact indexed column, you won't need to go to the table at all and the index can be searched in an efficient manner O(log n) I think.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜