SQL query to calculate running group counts on time-phased data
I have some data, like this:
BUG DATE STATUS
---- ---------------------- --------
9012 18/03/2008 9:08:44 AM OPEN
9012 18/03/2008 9:10:03 AM OPEN
9012 28/03/2008 4:55:03 PM RESOLVED
9012 28/03/2008 5:25:00 PM CLOSED
9013 18/03/2008 9:12:59 AM OPEN
9013 18/03/2008 9:15:06 AM RESOLVED
9013 18/03/2008 9:16:44 AM CLOSED
9014 18/03/2008 9:17:54 AM OPEN
9014 18/03/2008 9:18:31 AM RESOLVED
9014 18/03/2008 9:19:30 AM CLOSED
9015 18/03/2008 9:22:40 AM OPEN
9015 18/03/2008 9:23:03 AM RESOLVED
9015 19/03/2008 12:27:08 PM CLOSED
9016 18/03/2008 9:24:20 AM OPEN
9016 18/03/2008 9:24:35 AM RESOLVED
9016 19/03/2008 12:28:14 PM CLOSED
9017 18/03/2008 9:25:47 AM OPEN
9017 18/03/2008 9:26:02 AM RESOLVED
9017 19/03/2008 12:30:30 PM CLOSED
Which I would like to transform into something like this:
DATE OPEN RESOLVED CLOSED
---------------------- -------- -------- --------
18/03/20开发者_高级运维08 9:08:44 AM 1 0 0
18/03/2008 9:12:59 AM 2 0 0
18/03/2008 9:15:06 AM 1 1 0
18/03/2008 9:16:44 AM 1 0 1
18/03/2008 9:17:54 AM 2 0 1
18/03/2008 9:18:31 AM 1 1 0
18/03/2008 9:19:30 AM 1 0 2
18/03/2008 9:22:40 AM 2 0 2
18/03/2008 9:23:03 AM 1 1 2
18/03/2008 9:24:20 AM 2 1 2
18/03/2008 9:24:35 AM 1 2 2
18/03/2008 9:25:47 AM 2 2 2
18/03/2008 9:26:02 AM 1 3 2
19/03/2008 12:27:08 PM 1 2 3
19/03/2008 12:28:14 PM 1 1 4
19/03/2008 12:30:30 PM 1 0 5
28/03/2008 4:55:03 PM 0 1 5
28/03/2008 5:25:00 PM 0 0 6
i.e. keeping running counts of bugs with each status.
This is easy enough to code up using cursors, but I'm wondering if any of you SQL gurus out there can help with a query to achieve this?
Ideally for mysql
, but I'm curious to see anything that will work.
With T as
(
Select B.BugDate As MainDate, C.BugID, C.BugDate ,C.Status,row_number() over(partition by B.BugDate,C.BugID order by C.BugDate Desc) As RowNum From Bug B Cross Join Bug C where B.BugDate >= C.BugDate Group By B.BugDate,C.BugID,C.BugDate, C.Status
)
Select T.MainDate ,SUM(CASE WHEN T.Status = 'OPEN' THEN 1 ELSE 0 END) As SUM_OPEN,SUM(CASE WHEN T.Status = 'RESOLVED' THEN 1 ELSE 0 END) As SUM_RESOLVED
,SUM(CASE WHEN T.Status = 'CLOSED' THEN 1 ELSE 0 END) As SUM_CLOSED
From T
where T.RowNum = 1
Group By T.MainDate
Order By T.MainDate
My spontaneous way to do that would not be in the SQL, since it'd require some looping to know the state of each category given the time of the next row. Instead, I'd fetch the original data and loop it through, modifying the amount of each category based on the STATUS-field of each row, and then outputting. If you'd like an example in, say, php, I could give you one.
Another way would be to cache this data in a separate table, and updating it whenever there's a new action. This way you can fetch it faster.
As you had said, you expect answer in any SQL implementation, so i am providing an answer for SQL Server 2005 and above, as i am using Windows function (Row_Number) and CROSS APPLY features.
The Idea is to get the last status of a bug for a give time.
Solution:
Select BugDate,
SUM(Case When RowNum = 1 AND Status = 'OPEN' Then 1 Else 0 End) As Opened,
SUM(Case When RowNum = 1 AND Status = 'RESOLVED' Then 1 Else 0 End) As Resolved,
SUM(Case When RowNum = 1 AND Status = 'CLOSED' Then 1 Else 0 End) As Closed
FROM
(
Select B1.BugDate,T2.BugId,T2.RowNum,T2.[Status]
from Bug B1
CROSS APPLY
(
SELECT T1.BugDate,T1.BugId, B.[Status],
ROW_NUMBER() Over(Partition by B1.BugDate, T1.BugId Order By Case when B.[Status] IS NULL THEN 1000 ELSE 1 END,T1.BugDate Desc) AS RowNum
FROM
(
Select BugDate,BugId
FROM
(
Select Distinct BugDate
from Bug
) D
CROSS JOIN
(
Select Distinct BugId
FROm Bug
) AS I
)T1
LEFT OUTER JOIN BUG B ON T1.BugDate = B.BugDate and T1.BugId = B.BugId
WHERE T1.BugDate <= B1.BugDate --AND T1.BugId = B1.BugId
) T2
)T1
GROUP BY BugDate
Order By BugDate
I created a same dataset as you have provided in sample question and the results i got are:
Date Op Re Cl
2008-03-18 09:08:44.000 1 0 0
2008-03-18 09:10:03.000 1 0 0
2008-03-18 09:12:59.000 2 0 0
2008-03-18 09:15:06.000 1 1 0
2008-03-18 09:16:44.000 1 0 1
2008-03-18 09:17:54.000 2 0 1
2008-03-18 09:18:31.000 1 1 1
2008-03-18 09:19:30.000 1 0 2
2008-03-18 09:22:40.000 2 0 2
2008-03-18 09:23:03.000 1 1 2
2008-03-18 09:24:20.000 2 1 2
2008-03-18 09:24:35.000 1 2 2
2008-03-18 09:25:47.000 2 2 2
2008-03-18 09:26:02.000 1 3 2
2008-03-19 12:27:08.000 1 2 3
2008-03-19 12:28:14.000 1 1 4
2008-03-19 12:30:30.000 1 0 5
2008-03-28 16:55:03.000 0 1 5
2008-03-28 17:25:00.000 0 0 6
Note: You are missing line 2 in your result set and also row for date 18/03/2008 9:18:31 AM is also not correct in your result.
Sub queries will work, but is it only three statuses that you have? and are they hardcoded or do they vary
EDIT
select <TableName>.DATE, (SELECT BUGS FROM CountTable WHERE DATE = <TableName>.DATE AND Status = 'OPEN') as 'OPEN'
, (SELECT BUGS FROM CountTable WHERE DATE = <TableName>.DATE ANDStatus = 'CLOSED') as 'CLOSED'
, (SELECT BUGS FROM CountTable WHERE DATE = <TableName>.DATE ANDStatus = 'RESOLVED') as 'RESOLVED'
FROM
<TableName>,
( SELECT DATE, STATUS, COUNT(BUG) as BUGS
FROM <TableName>
GROUP BY DATE,STATUS) as CountTable
WHERE CountTable.DATE = <TableName>.DATE
Alternatively, for a simpler solution
SELECT DATE, (SELECT count(BUGS) FROM <TableName> bugs2 WHERE Status = 'OPEN' AND bugs.DATE = bugs2.DATE) as 'OPEN'
, (SELECT count(BUGS) FROM <TableName> bugs2 WHERE Status = 'CLOSED' AND bugs.DATE = bugs2.DATE) as 'CLOSED'
, (SELECT count(BUGS) FROM <TableName> bugs2 WHERE Status = 'RESOLVED' AND bugs.DATE = bugs2.DATE) as 'RESOLVED'
FROM <TableName> bugs
group by DATE
精彩评论