开发者

Efficiently pulling thousands of records with SQL Server?

This is kind of a part 2 to this question: Select rows and Update same rows for locking?

I have a table in my database that hold millions of records. I have created a stored procedure to pull X amount of records and mark them as being locked so when my application calls the next set of X amount it will only pull the records that are not locked and so and on and so on… I am finding that with millions of records this is not very efficient. The queries are taking a while to run. Does anyone have some suggestions to increase my query efficiency while keeping the same idea for locking records? Below is the stored procedure:

Set Rowcount @topCount
 SELECT *
 into #temp_candidates
 FROM dbo.Candidates
 Where isTested = 0 and
       isLocked = 0 and 
       validated = 0 and
       validationId is null
 UPDATE dbo.Candidates
 SET islocked = 1,
 validationId = @validationId,
 validationDateTime = @validationDateTime
 WHERE id IN (SELECT id FROM #temp_candidates)
 Select *
 from dbo.Candidates
 where id in (SELECT id FROM #temp_candidates) IF EXISTS (SELECT NULL FROM
  tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'temp开发者_运维百科db..#temp_candidates')) 
   BEGIN 
     DROP TABLE #temp_candidates
   END 
END


This kind of procedure are guaranteed to fail under concurrency. What you want to to achieve is to use a table as a queue, which is notoriously error prone, and there are well know tested and tried methods to use Use Tables as Queues. Right now you are on a path that leads nowhere fast. On your previous question you should had selected Cade's answer.

To dequeue items form the table (lock them for processing in the presence of concurrent threads trying to do the same) several conditions have to be met, all outlined in the article I linked, and the actual query is only one of them. The most important condition is to actually have a correct clustered index. The isLocked field must be the leftmost field in the clustered key, followed by the ordering criteria.

CREATE CLUSTERED INDEX cdxCandidates ON Candidates(isLocked, ...);

This clustered index is required for efficient dequeue operations. Then you must use the OUTPUT clause, is the only way to pull this out correctly:

WITH cte AS (
  SELECT TOP (@x) isLocked, ...
  FROM Candidates WITH (READPAST)
  WHERE isLocked = 0
  ORDER BY ...)
UPDATE cte
  SET isLocked = 1
  OUTPUT INSERTED.*;

The Query Processor understands that you're selecting for update and will acquire the rows in the table in a manner that prevent a concurrent thread from acquiring the same rows.

Incidentally this 'scheme' is exactly the scheme (the clustered index condition and the update with OUTPUT on top of a READPAST SELECT) is used by Service Broker's QUEUEs, which were designed specifically for high concurrent enqueue/dequeue operations. Other alternatives were considered this is the only one that has a decent chance of getting any performance while maintaining correctness


I'm not sure I understand the need for the temp database. Why not just set isLocked = 1 for whatever records you wish (inside a transaction). Do whatever you want to them. Then set isLocked = 1 where isLocked = 0 and repeat.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜