Find the longest sequence of a value in a table
This is an SQL Question, I think it is 开发者_开发技巧difficult one - I'm not sure it is possible to achieve in a simple SQL sentence or a stored procedure:
I want to find the number of the longest sequence of the same (known) number in a column in a table:
example:
TABLE:
DATE SALEDITEMS
1/1/09 4
1/2/09 3
1/3/09 3
1/4/09 4
1/5/09 3
calling the sp/sentence for 4 will give 1 calling the sp/sentecne for 3 will give 2 as there was 2 times in a row number 3.
I'm running SQL server 2008.
UPDATE: I generated a million rows of random data, and abandoned the recursive CTE solution, as its query plan didn't make good use of indexes in the optimizer.
But the non-recursive solution I originaly posted turned out to work great, as long as there was an additional non-clustered index on (SALEDITEMS, [DATE]). This makes sense, since the query needs to filter in both directions (both by date and by SALEDITEMS). With this additional index, queries on a million rows return in under 2 seconds on my (not very beefy) desktop mathine. Without this index, the query was dog-slow.
BTW, this is a great example of how SQL Server's cost-based query optimization totally breaks down in some cases. The recursive CTE solution has a cost (on my PC) of 42 and takes at least several minutes to finish. The non-recursive solution has a cost of 15,446 (!!!) and completes in 1.5 seconds. Moral of the story: when comparing SQL Server query plans, don't assume that cost necessarily correlates to query performance!
Anyway, here's the solution I'd recommend (the same non-recursive CTE I posted earlier) :
DECLARE @SALEDITEMS INT = 3;
WITH SalesNoMatch ([DATE], SALEDITEMS, NoMatchDate)
AS
(
SELECT [DATE], SALEDITEMS,
(SELECT MIN([DATE]) FROM Sales s2 WHERE s2.SALEDITEMS <> @SALEDITEMS
AND s2.[DATE] > s1.[DATE]) as NoMatchDate
FROM Sales s1
)
, SalesMatchCount ([DATE], ConsecutiveCount) AS
(
SELECT [DATE], 1+(SELECT COUNT(1) FROM Sales s2 WHERE s2.[DATE] > s1.[DATE] AND s2.[DATE] < NoMatchDate)
FROM SalesNoMatch s1
WHERE s1.SALEDITEMS = @SALEDITEMS
)
SELECT MAX(ConsecutiveCount)
FROM SalesMatchCount;
Here's the DDL I used to test this, including indexes you'll need:
CREATE TABLE [Sales](
[DATE] date NOT NULL,
[SALEDITEMS] int NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX IX_Sales ON Sales ([DATE]);
CREATE UNIQUE NONCLUSTERED INDEX IX_Sales2 ON Sales (SALEDITEMS, [DATE]);
And here's how I created my test data-- 1,000,001 rows with ascending dates with SALEDITEMS randomly set between 1 and 10.
INSERT INTO Sales ([DATE], SALEDITEMS)
VALUES ('1/1/09', 5)
DECLARE @i int = 0;
WHILE (@i < 1000000)
BEGIN
INSERT INTO Sales ([DATE], SALEDITEMS)
SELECT DATEADD (d, 1, (SELECT MAX ([DATE]) FROM Sales)), ABS(CHECKSUM(NEWID())) % 10 + 1
SET @i = @i + 1;
END
Here's the recursive-CTE solution that I abandoned: DECLARE @SALEDITEMS INT = 3;
-- recursive CTE solution (remember to set MAXRECURSION!)
WITH SalesRowNum ([DATE], SALEDITEMS, RowNum)
AS
(
SELECT [DATE], SALEDITEMS, ROW_NUMBER() OVER (ORDER BY s1.[DATE]) as RowNum
FROM Sales s1
)
, SalesCTE (RowNum, [DATE], ConsecutiveCount)
AS
(
SELECT s1.RowNum, s1.[DATE], 1 AS ConsecutiveCount
FROM SalesRowNum s1
WHERE SALEDITEMS = @SALEDITEMS
UNION ALL
SELECT s1.RowNum, s1.[DATE], ConsecutiveCount + 1 AS ConsecutiveCount
FROM SalesRowNum s1
INNER JOIN SalesCTE s2 ON s1.RowNum = s2.RowNum + 1
WHERE SALEDITEMS = @SALEDITEMS
)
SELECT MAX(ConsecutiveCount)
FROM SalesCTE;
Untested, because you did not provide DDL and sample data:
DECLARE @SALEDITEMS INT;
SET @SALEDITEMS=3;
SELECT MAX(cnt) FROM(
SELECT COUNT(*) FROM YourTable JOIN (
SELECT y1.[Date] AS d1, y2.[Date] AS d2
FROM YourTable AS y1 JOIN YourTable AS y2
ON y1.SALEDITEMS=@SALEDITEMS AND y2.SALEDITEMS=@SALEDITEMS
AND NOT EXISTS(SELECT 1 FROM YourTable AS y
WHERE y.SALEDITEMS<>@SALEDITEMS
AND y1.[Date] < y.[Date] AND y.[Date] < y2.[Date])
) AS t
WHERE [Date] BETWEEN t.d1 AND t.d2
) AS t;
精彩评论