How to avoid overlapping date ranges when using a grouping clause?

2023-02-03 03:19 问答作者：

I have a situation where I need to find time spans between value changes. I tried a simple group by clause but it eliminates overlapping changes. Consider the following example:

create table #items (
      code varchar(4)
    , class varchar(4)
    , txdate datetime
)

insert into #items (code, class, txdate) values ('A', 'C', '2010-01-01');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-02');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-03');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-04');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-05');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-06');
insert into #items (开发者_开发百科code, class, txdate) values ('A', 'C', '2010-01-07');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-08');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-09');

select code
, class
, min(txdate) mindate
, max(txdate) maxdate
from #items
group by code, class

This returns the following results (notice the overlapping date ranges):

|code|class|mindate   |maxdate   |
----------------------------------
|A   |C    |2010-01-01|2010-01-07|
|A   |D    |2010-01-04|2010-01-09|

I would like to have the query return the following:

|code|class|mindate   |maxdate   |
----------------------------------
|A   |C    |2010-01-01|2010-01-03|
|A   |D    |2010-01-04|2010-01-05|
|A   |C    |2010-01-06|2010-01-07|
|A   |D    |2010-01-08|2010-01-09|

Any ideas and suggestions?

EDIT: As pointed out in the comments, this is still not quite right.

;with cteNtile as (
    select code, class, txdate, 
           ntile((select count(*) from (select NULL as dummy from #items group by code, class) a)) over(partition by code, class order by txdate) as tilenum
        from #items
)
select code, class, MIN(txdate) as mindate, MAX(txdate) as maxdate
    from cteNtile
    group by code, class, tilenum
    order by mindate, maxdate

Here is the query that can give you the desired result.

;WITH items1 AS (
SELECT ROW_NUMBER() OVER (ORDER BY txdate) rowid, code, class, txdate
from #items
),
items2 AS (
SELECT ROW_NUMBER() OVER (ORDER BY rowid) id, rowid, i1.Code, i1.Class, i1.txdate
FROM items1 i1
WHERE NOT EXISTS (SELECT 1 FROM items1 i2
                  WHERE i2.txdate < i1.txdate
                  AND i2.class = i1.class
                  AND i2.Code = i1.Code 
                  AND i2.rowid+1=i1.rowid)
)
SELECT items2.code, items2.class, items2.txdate mindate, items1.txdate maxdate
FROM items2,  items2 items3, items1 
WHERE (items2.id+1=items3.id AND items3.rowid-1=items1.rowid)
OR items2.rowid = (SELECT MAX(t.rowid) FROM items1 t)
UNION 
SELECT items2.code, items2.class, MAX(items2.txdate) mindate, MAX(items1.txdate) maxdate
FROM items2, items1 
WHERE items1.class = items2.class
GROUP BY items1.class, items2.class, items2.code, items2.class
ORDER BY items2.txdate

I think you can´t do this with a simple select statement.

You can use a cursor to iterate over the rows and identify the "class" changes.

After researching SQL SERVER ISLANDS as suggested by @KM, I came up with the following query which seems to work well when additional class codes are added to the dataset.

select a.code, a.class, a.txdate as mindate, b.txdate as maxdate
from (
    --Find minimum island
    select code
        , class
        , txdate
        , row_number() over (order by code, class, txdate) as n
    from #items tb1
    where not exists (
        select *
        from #items tb2
        where datediff(d, tb1.txdate, tb2.txdate) = -1      
          and tb1.class = tb2.class
          and tb1.code = tb2.code
    ) 
) as a
inner join (
    --Find maximum island 
    select code
        , class
        , txdate
        , row_number() over (order by code, class, txdate) as n
    from #items tb1
    where not exists (
        select *
        from #items tb2
        where datediff(d, tb1.txdate, tb2.txdate) = 1   
          and tb1.class = tb2.class
          and tb1.code = tb2.code
    ) 
) as b on a.n = b.n

The only caveat to this approach is that the number of entries in the minimum set needs to match the number of entries in the maximum set. So far I have not been able to do anything that would make this not true. However, I did not test null values or performance.

继续阅读：sql tsql

How to avoid overlapping date ranges when using a grouping clause?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？