Efficiently pulling thousands of records with SQL Server?
This is kind of a part 2 to this question: Select rows and Update same rows for locking?
I have a table in my database that hold millions of records. I have created a stored procedure to pull X amount of records and mark them as being locked so when my application calls the next set of X amount it will only pull the records that are not locked and so and on and so on… I am finding that with millions of records this is not very efficient. The queries are taking a while to run. Does anyone have some suggestions to increase my query efficiency while keeping the same idea for locking records? Below is the stored procedure:
Set Rowcount @topCount
SELECT *
into #temp_candidates
FROM dbo.Candidates
Where isTested = 0 and
isLocked = 0 and
validated = 0 and
validationId is null
UPDATE dbo.Candidates
SET islocked = 1,
validationId = @validationId,
validationDateTime = @validationDateTime
WHERE id IN (SELECT id FROM #temp_candidates)
Select *
from dbo.Candidates
where id in (SELECT id FROM #temp_candidates) IF EXISTS (SELECT NULL FROM
tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'temp开发者_运维百科db..#temp_candidates'))
BEGIN
DROP TABLE #temp_candidates
END
END
This kind of procedure are guaranteed to fail under concurrency. What you want to to achieve is to use a table as a queue, which is notoriously error prone, and there are well know tested and tried methods to use Use Tables as Queues. Right now you are on a path that leads nowhere fast. On your previous question you should had selected Cade's answer.
To dequeue items form the table (lock them for processing in the presence of concurrent threads trying to do the same) several conditions have to be met, all outlined in the article I linked, and the actual query is only one of them. The most important condition is to actually have a correct clustered index. The isLocked
field must be the leftmost field in the clustered key, followed by the ordering criteria.
CREATE CLUSTERED INDEX cdxCandidates ON Candidates(isLocked, ...);
This clustered index is required for efficient dequeue operations. Then you must use the OUTPUT clause, is the only way to pull this out correctly:
WITH cte AS (
SELECT TOP (@x) isLocked, ...
FROM Candidates WITH (READPAST)
WHERE isLocked = 0
ORDER BY ...)
UPDATE cte
SET isLocked = 1
OUTPUT INSERTED.*;
The Query Processor understands that you're selecting for update and will acquire the rows in the table in a manner that prevent a concurrent thread from acquiring the same rows.
Incidentally this 'scheme' is exactly the scheme (the clustered index condition and the update with OUTPUT on top of a READPAST SELECT) is used by Service Broker's QUEUEs, which were designed specifically for high concurrent enqueue/dequeue operations. Other alternatives were considered this is the only one that has a decent chance of getting any performance while maintaining correctness
I'm not sure I understand the need for the temp database. Why not just set isLocked = 1 for whatever records you wish (inside a transaction). Do whatever you want to them. Then set isLocked = 1 where isLocked = 0 and repeat.
精彩评论