Find the start and end date (set based) in T-SQL

2023-02-03 03:55 问答作者：

I have the below.

Name    Date
A   2011-01-01 01:00:00.000
A   2011-02-01 02:00:00.000
A   2011-03-01 03:00:00.000
B   2011-04-01 04:00:00.000
A   2011-05-01 07:00:00.000

The desired output is

Name       StartDate                        EndDate
-------------------------------------------------------------------
A          2011-01-01 01:00:00.000         2011-04-01 04:00:00.000    
B          2011-04-01 04:00:00.000         2011-05-01 07:00:00.000    
A          2011-05-01 07:00:00.000         NULL

How to achieve the same using TSQL in a set based approach.

DDL is as under

DECLARE @t TABLE(PersonName VARCHAR(32), [Date] DATETIME) 
INSERT INTO @t VALUES('A', '2011-01-01 01:00:00') 
INSERT INTO @t VALUES('A', '2011-01-02 02:00:00') 
INSERT INTO @t VALUES('A', '2011-01-03 03:00:00') 
INSERT INTO @t VALUES('B', '2011-01-04 04:开发者_如何学运维00:00') 
INSERT INTO @t VALUES('A', '2011-01-05 07:00:00')

Select * from @t

;WITH cte1
     AS (SELECT *,
                ROW_NUMBER() OVER (ORDER BY Date) -
                ROW_NUMBER() OVER (PARTITION BY PersonName
                ORDER BY Date) AS G
         FROM   @t),
     cte2
     AS (SELECT PersonName,
                MIN([Date]) StartDate,
                ROW_NUMBER() OVER (ORDER BY  MIN([Date])) AS rn
         FROM   cte1
         GROUP  BY PersonName,
                   G)
SELECT a.PersonName,
       a.StartDate,
       b.StartDate AS EndDate
FROM   cte2 a
       LEFT JOIN cte2 b
         ON a.rn + 1 = b.rn

Because the result of CTEs are not generally materialised however you may well find you get better performance if you materialize the intermediate result yourself as below.

DECLARE @t2 TABLE (
  rn         INT IDENTITY(1, 1) PRIMARY KEY,
  PersonName VARCHAR(32),
  StartDate  DATETIME );

INSERT INTO @t2
SELECT PersonName,
       MIN([Date]) StartDate
FROM   (SELECT *,
               ROW_NUMBER() OVER (ORDER BY Date) -
               ROW_NUMBER() OVER (PARTITION BY PersonName
               ORDER BY Date) AS G
        FROM   @t) t
GROUP  BY PersonName,
          G
ORDER  BY StartDate

SELECT a.PersonName,
       a.StartDate,
       b.StartDate AS EndDate
FROM   @t2 a
       LEFT JOIN @t2 b
         ON a.rn + 1 = b.rn

SELECT
  PersonName,
  StartDate = MIN(Date),
  EndDate
FROM (
  SELECT
    PersonName,
    Date,
    EndDate = (
      /* get the earliest date after current date
         associated with a different person */
      SELECT MIN(t1.Date)
      FROM @t AS t1
      WHERE t1.Date > t.Date
        AND t1.PersonName <> t.PersonName
    )
  FROM @t AS t
) s
GROUP BY PersonName, EndDate
ORDER BY 2

Basically, for every Date we find the nearest date after it such that is associated with a different PersonName. That gives us EndDate, which now distinguishes for us consecutive groups of dates for the same person.

Now we only need to group the data by PersonName & EndDate and get the minimal Date in every group as StartDate. And yes, sort the data by StartDate, of course.

Get a row number so you will know where the previous record is. Then, take a record and the next record after it. When the state changes we have a candidate row.

select 
  state, 
  min(start_timestamp),
  max(end_timestamp)

from
(
    select
        first.state, 
        first.timestamp_ as start_timestamp,
        second.timestamp_ as end_timestamp

        from
        (
            select
                *, row_number() over (order by timestamp_) as id
            from test
        ) as first

        left outer join
        (
            select
                *, row_number() over (order by timestamp_) as id
            from test
        ) as second
        on 
            first.id = second.id - 1 
            and first.state != second.state
) as agg
group by state
    having max(end_timestamp) is not null 

union

-- last row wont have a ending row
--(select state, timestamp_, null from test order by timestamp_ desc limit 1)
    -- I think it something like this for sql server
     (select top state, timestamp_, null from test order by timestamp_ desc)

order by 2
;

Tested with PostgreSQL but should work with SQL Server as well

The other answer with the cte is a good one. Another option would be to iterate over the collection in any case. It's not set based, but it is another way to do it.

You will need to iterate to either A. assign a unique id to each record that corresponds to its transaction, or B. to actually get your output.

TSQL is not ideal for iterating over records, especially if you have a lot, and so I would recommend some other way of doing it, a small .net program or something that is better at iterating.

There's a very quick way to do this using a bit of Gaps and Islands theory:

WITH CTE as (SELECT PersonName, [Date]
                   , Row_Number() over (ORDER BY [Date])
                     - Row_Number() over (ORDER BY PersonName, [Date]) as Island
             FROM @t)

Select PersonName, Min([Date]), Max([Date])
from CTE
GROUP BY Island, PersonName
ORDER BY Min([Date])

继续阅读：gaps-and-islands sql sql-server sql-server-2005 tsql

Find the start and end date (set based) in T-SQL

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？