Efficiently pulling thousands of records with SQL Server?

2023-01-11 18:21 问答作者：

This is kind of a part 2 to this question: Select rows and Update same rows for locking?

I have a table in my database that hold millions of records. I have created a stored procedure to pull X amount of records and mark them as being locked so when my application calls the next set of X amount it will only pull the records that are not locked and so and on and so on… I am finding that with millions of records this is not very efficient. The queries are taking a while to run. Does anyone have some suggestions to increase my query efficiency while keeping the same idea for locking records? Below is the stored procedure:

Set Rowcount @topCount
 SELECT *
 into #temp_candidates
 FROM dbo.Candidates
 Where isTested = 0 and
       isLocked = 0 and 
       validated = 0 and
       validationId is null
 UPDATE dbo.Candidates
 SET islocked = 1,
 validationId = @validationId,
 validationDateTime = @validationDateTime
 WHERE id IN (SELECT id FROM #temp_candidates)
 Select *
 from dbo.Candidates
 where id in (SELECT id FROM #temp_candidates) IF EXISTS (SELECT NULL FROM
  tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'temp开发者_运维百科db..#temp_candidates')) 
   BEGIN 
     DROP TABLE #temp_candidates
   END 
END

This kind of procedure are guaranteed to fail under concurrency. What you want to to achieve is to use a table as a queue, which is notoriously error prone, and there are well know tested and tried methods to use Use Tables as Queues. Right now you are on a path that leads nowhere fast. On your previous question you should had selected Cade's answer.

To dequeue items form the table (lock them for processing in the presence of concurrent threads trying to do the same) several conditions have to be met, all outlined in the article I linked, and the actual query is only one of them. The most important condition is to actually have a correct clustered index. The isLocked field must be the leftmost field in the clustered key, followed by the ordering criteria.

CREATE CLUSTERED INDEX cdxCandidates ON Candidates(isLocked, ...);

This clustered index is required for efficient dequeue operations. Then you must use the OUTPUT clause, is the only way to pull this out correctly:

WITH cte AS (
  SELECT TOP (@x) isLocked, ...
  FROM Candidates WITH (READPAST)
  WHERE isLocked = 0
  ORDER BY ...)
UPDATE cte
  SET isLocked = 1
  OUTPUT INSERTED.*;

The Query Processor understands that you're selecting for update and will acquire the rows in the table in a manner that prevent a concurrent thread from acquiring the same rows.

Incidentally this 'scheme' is exactly the scheme (the clustered index condition and the update with OUTPUT on top of a READPAST SELECT) is used by Service Broker's QUEUEs, which were designed specifically for high concurrent enqueue/dequeue operations. Other alternatives were considered this is the only one that has a decent chance of getting any performance while maintaining correctness

I'm not sure I understand the need for the temp database. Why not just set isLocked = 1 for whatever records you wish (inside a transaction). Do whatever you want to them. Then set isLocked = 1 where isLocked = 0 and repeat.

继续阅读：sql sql-server-2008 stored-procedures

Efficiently pulling thousands of records with SQL Server?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？