Find the start and end date (set based) in T-SQL
I have the below.
Name Date
A 2011-01-01 01:00:00.000
A 2011-02-01 02:00:00.000
A 2011-03-01 03:00:00.000
B 2011-04-01 04:00:00.000
A 2011-05-01 07:00:00.000
The desired output is
Name StartDate EndDate
-------------------------------------------------------------------
A 2011-01-01 01:00:00.000 2011-04-01 04:00:00.000
B 2011-04-01 04:00:00.000 2011-05-01 07:00:00.000
A 2011-05-01 07:00:00.000 NULL
How to achieve the same using TSQL in a set based approach.
DDL is as under
DECLARE @t TABLE(PersonName VARCHAR(32), [Date] DATETIME)
INSERT INTO @t VALUES('A', '2011-01-01 01:00:00')
INSERT INTO @t VALUES('A', '2011-01-02 02:00:00')
INSERT INTO @t VALUES('A', '2011-01-03 03:00:00')
INSERT INTO @t VALUES('B', '2011-01-04 04:开发者_如何学运维00:00')
INSERT INTO @t VALUES('A', '2011-01-05 07:00:00')
Select * from @t
;WITH cte1
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY PersonName
ORDER BY Date) AS G
FROM @t),
cte2
AS (SELECT PersonName,
MIN([Date]) StartDate,
ROW_NUMBER() OVER (ORDER BY MIN([Date])) AS rn
FROM cte1
GROUP BY PersonName,
G)
SELECT a.PersonName,
a.StartDate,
b.StartDate AS EndDate
FROM cte2 a
LEFT JOIN cte2 b
ON a.rn + 1 = b.rn
Because the result of CTEs are not generally materialised however you may well find you get better performance if you materialize the intermediate result yourself as below.
DECLARE @t2 TABLE (
rn INT IDENTITY(1, 1) PRIMARY KEY,
PersonName VARCHAR(32),
StartDate DATETIME );
INSERT INTO @t2
SELECT PersonName,
MIN([Date]) StartDate
FROM (SELECT *,
ROW_NUMBER() OVER (ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY PersonName
ORDER BY Date) AS G
FROM @t) t
GROUP BY PersonName,
G
ORDER BY StartDate
SELECT a.PersonName,
a.StartDate,
b.StartDate AS EndDate
FROM @t2 a
LEFT JOIN @t2 b
ON a.rn + 1 = b.rn
SELECT
PersonName,
StartDate = MIN(Date),
EndDate
FROM (
SELECT
PersonName,
Date,
EndDate = (
/* get the earliest date after current date
associated with a different person */
SELECT MIN(t1.Date)
FROM @t AS t1
WHERE t1.Date > t.Date
AND t1.PersonName <> t.PersonName
)
FROM @t AS t
) s
GROUP BY PersonName, EndDate
ORDER BY 2
Basically, for every Date
we find the nearest date after it such that is associated with a different PersonName
. That gives us EndDate
, which now distinguishes for us consecutive groups of dates for the same person.
Now we only need to group the data by PersonName
& EndDate
and get the minimal Date
in every group as StartDate
. And yes, sort the data by StartDate
, of course.
Get a row number so you will know where the previous record is. Then, take a record and the next record after it. When the state changes we have a candidate row.
select
state,
min(start_timestamp),
max(end_timestamp)
from
(
select
first.state,
first.timestamp_ as start_timestamp,
second.timestamp_ as end_timestamp
from
(
select
*, row_number() over (order by timestamp_) as id
from test
) as first
left outer join
(
select
*, row_number() over (order by timestamp_) as id
from test
) as second
on
first.id = second.id - 1
and first.state != second.state
) as agg
group by state
having max(end_timestamp) is not null
union
-- last row wont have a ending row
--(select state, timestamp_, null from test order by timestamp_ desc limit 1)
-- I think it something like this for sql server
(select top state, timestamp_, null from test order by timestamp_ desc)
order by 2
;
Tested with PostgreSQL but should work with SQL Server as well
The other answer with the cte is a good one. Another option would be to iterate over the collection in any case. It's not set based, but it is another way to do it.
You will need to iterate to either A. assign a unique id to each record that corresponds to its transaction, or B. to actually get your output.
TSQL is not ideal for iterating over records, especially if you have a lot, and so I would recommend some other way of doing it, a small .net program or something that is better at iterating.
There's a very quick way to do this using a bit of Gaps and Islands theory:
WITH CTE as (SELECT PersonName, [Date]
, Row_Number() over (ORDER BY [Date])
- Row_Number() over (ORDER BY PersonName, [Date]) as Island
FROM @t)
Select PersonName, Min([Date]), Max([Date])
from CTE
GROUP BY Island, PersonName
ORDER BY Min([Date])
精彩评论