Need some tSQL Wizardry: SQL Update Based on Running Total
I've got an implementation for this that uses a super hairy recursive CTE that is really hard to follow/maintain. I was hoping that one of the brains on SO could come up with some more straightforward tSQL approach code to accomplish the following:
Table Documents
DocID SortOrder PageCount StartPgNum EndPgNum
5 1 2 {1} {2}
8 2 7 {3} {9}
22 3 3 {10} {12}
Fo开发者_开发知识库r the table given above, I need a query to populate StartPgNum and EndPgNum (Sample values included in the example in {} to make the intentions clearer for what I need.
Assumptions:
* DocID, SortOrder, and PageCount are pre-populated. * StartPgNum and EndgNum need to be populated by tSQL code. * SortOrder always starts at 1, and is continuous with no gaps. * Documents should get a continuous page numbering scheme as ordered by SortOrderUpdated to be better :)
DECLARE @temp TABLE (DocID INT, SortOrder INT, PageCount INT)
INSERT INTO @temp VALUES (5, 1, 2)
INSERT INTO @temp VALUES (8, 2, 7)
INSERT INTO @temp VALUES (22, 3, 3)
SELECT
*,
StartPgNum + PageCount-1 AS EndPgNum
FROM
(SELECT
DocID,
SortOrder,
PageCount,
ISNULL((SELECT SUM(PageCount)+1 FROM @temp WHERE SortOrder < parent.SortOrder), 1) AS StartPgNum
FROM
@temp parent) _temp
I did some testing on all of the solutions provided here in the other answers, my original "Hairy Recursive CTE" option and for the sake of completeness a simple cursor based approach. To my great surprise the cursor option performed the best by a clear margin in all my tests (1K Rows, 10KRows, 50K Rows, 500K Rows)
Here are the average times for each approach for 10K records:
Hairy Recursive CTE: 3 minutes 55 seconds
CROSS APPLY (Ben Dempsey): 21-25 seconds
SUBSELECTS (Tim Khouri): 19-21 seconds
CURSOR: 1-2 Seconds
Here is my cursor based solution:
Declare @temp TABLE(
DocID INT PRIMARY KEY NOT NULL,
SortOrder INT NOT NULL,
PageCount INT NOT NULL,
BegPg int,
EndPg int
)
Insert into @temp (DocID,SortOrder,PageCount)
SELECT top 50000 docid, ROW_NUMBER() OVER (ORDER BY DOCID),Pages FROM tblDocuments
DECLARE @PC int
SET @PC=1
DECLARE @FetchPageCount int
DECLARE @FetchDocID int
DECLARE myCursor CURSOR FOR
SELECT DocID, PageCount FROM @temp ORDER BY SortOrder
OPEN myCursor
FETCH NEXT FROM myCursor INTO @FetchDocID,@FetchPageCount
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE @temp SET BegPg=@PC, EndPg=@PC+ @FetchPageCount-1
WHERE (Docid=@fetchDocid)
SET @PC = @PC + @FetchPageCount
FETCH NEXT FROM myCursor INTO @FetchDocID,@FetchPageCount
END
CLOSE myCursor
DEALLOCATE myCursor
SELECT * FROM @temp
Who would have guessed it? Maybe cursors aren't always evil.
A word of warning: Lest you be tempted to replace the update to the "WHERE CURRENT OF myCursor" syntax, it performed much slower than using the current version with a where clause, although still faster than most of the other approaches.
The fastest way to do it would be with a Quirky Update. It depends whether you fall into the 'Microsoft don't explicitly say it works so I'll avoid' it camp or not...
Otherwise you're in hairy recursive CTE (as you've already discovered) or triangular join (which could become a nightmare on a large data set) territory.
Maybe one of these three solutions can help, since this is a kind of "running total" problem: http://www.sqlteam.com/article/calculating-running-totals
SQL 2008 using cross apply (running total)
/*
DocID SortOrder PageCount StartPgNum EndPgNum
5 1 2 {1} {2}
8 2 7 {3} {9}
22 3 3 {10} {12}
*/
Declare @MyTable TABLE(
DocID int,
SortOrder int,
PageCount int
)
Insert into @MyTable(DocID,SortOrder,PageCount)
values (5,1,2), (8,2,7), (22,3,3)
select
T1.docID,
T1.Sortorder,
T1.Pagecount,
(T.RunningTotal - T1.Pagecount) + 1 StartPgNum ,
T.RunningTotal EndPgNum
FROM @MyTable T1
CROSS APPLY ( Select SUM(PageCount) RunningTotal FROM @MyTable where SortOrder <= T1.SortOrder) T
order by T1.sortorder
I chose to tackle these two problems by creating functions, one to get the first page the second to get the last page. Here are the functions and the query which will work.
CREATE FUNCTION dbo.GetFirstPage(@SortOrder int)
RETURNS int
as
BEGIN
DECLARE @FirstPage int
SET @FirstPage = 1
IF(@SortOrder > 1)
BEGIN
SELECT @FirstPage = SUM(PageCount) + 1
FROM Documents
WHERE SortOrder < @SortOrder
END
RETURN @FirstPage
END
CREATE FUNCTION dbo.GetLastPage(@FirstPage int, @PageCount int)
RETURNS int
AS
BEGIN
RETURN (@FirstPage + @PageCount -1)
END
And finally the query.
SELECT * ,
dbo.GetFirstPage(SortOrder) AS FirstPage,
dbo.GetLastPage(dbo.GetFirstPage(SortOrder),Pagecount) AS LastPage
FROM Documents
精彩评论