Get total time interval from multiple rows if sequence not broken

2023-03-31 14:19 问答作者：

I have Work and Person tables (these are just examples to understand problem).

Structure

`Work` table

id INTEGER
person_id INTEGER
dt_from DATETIME
dt_to DATETIME

`Person` table

person_id INTEGER
name VARCHAR(50)

Data

`Work` table

id | person_id | dt_from    | dt_to
-------------------------------------------------
1  | 1         | 2011-01-01 | 2011-02-02
2  | 1         | 2011-02-02 | 2011-04-04
3  | 1         | 2011-06-06 | 2011-09-09
4  | 2         | 2011-01-01 | 2011-02-02
5  | 2         | 2011-02-02 | 2011-03-03
....etc.

`Person` table

Just person names with person id

Expected output

Person 1 : 2011-01-01 - 2011-04-04
Person 1 : 2011-06-06 - 2011-09-09
Person 2 : 2011-01-01 - 2011-03-03

Interval must be in seque开发者_JS百科nce. It cannot be broken somewhere in middle. Thats why Person 1 have two intervals.

I'm using postgres if it changes something. Have you any thougths? I wanted do it in one query, but if there is no such solution i will do some interval merge in php.

There may be a way to do this in one SQL select, but it escapes me. I managed to do it with one stored function though. Here's what I did for my testing:

create table work
(id integer, start_date date, end_date date);

insert into work values (1, '2011-01-01','2011-02-02');
insert into work values (1, '2011-02-02','2011-04-04');
insert into work values (1, '2011-06-06','2011-09-09');
insert into work values (2, '2011-01-01','2011-02-02');
insert into work values (2, '2011-02-02','2011-03-03');

create or replace function get_data() returns setof work as
$body$
declare
    res work%rowtype;
    sd  date := null;
begin
    for res in
        select
            w1.id,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.end_date=w1.start_date) then null else w1.start_date end,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.start_date=w1.end_date) then null else w1.end_date end
        from
            work w1
        order by
            id, start_date, end_date
    loop
        if res.start_date is not null and res.end_date is not null then
            return next res;
        elsif res.start_date is not null then
            sd := res.start_date;
        elsif res.end_date is not null then
            res.start_date := sd;
            return next res;
        end if;
    end loop;

    return;
end;$body$
  language 'plpgsql';

Then

select * from get_data() order by id, start_date;

returned this result:

 id | start_date |  end_date
----+------------+------------
  1 | 2011-01-01 | 2011-04-04
  1 | 2011-06-06 | 2011-09-09
  2 | 2011-01-01 | 2011-03-03
(3 rows)

which is, I think, what you're after.

You could try postgres's WITH RECURSIVE construct. (after all, a linked list is a kind of tree) Getting the boundary conditions right will be a problem, but at least it would solve the problem without the need for loops.

UPDATE: Added code. The problem with RECURSIVE is that you can only specify only the "tail" boundary condition. To specify the "head" condition, you need to wrap it into a view.

CREATE VIEW collected_time AS (
    WITH RECURSIVE ztree(person_id, dt_from, dt_to)  AS ( 
    -- Terminal part
    SELECT  pr.person_id, pr.dt_from, pr.dt_to
    FROM prikklok pr
    WHERE NOT EXISTS (
            SELECT * FROM prikklok px
            WHERE px.person_id = pr.person_id AND px.dt_from = pr.dt_to
            )
    UNION
    -- Recursive part
    SELECT  p1.person_id AS person_id
    , p1.dt_from AS dt_from
    , p2.dt_to AS dt_to
    FROM prikklok AS p1
    , ztree AS p2
    WHERE p1.person_id = p2.person_id
    AND p1.dt_to = p2.dt_from
    )
SELECT *
FROM ztree zt
WHERE NOT EXISTS (select *
    FROM prikklok p3
    WHERE p3.person_id = zt.person_id
        AND p3.dt_to = zt.dt_from
            )
    );

SELECT * FROM collected_time;

-- now generate some data with gaps

INSERT INTO prikklok
    SELECT serie_n
    , serie_t
    , serie_t + '1 month'::interval
FROM generate_series (1,10) serie_n
    , generate_series ( '1970-01-01 00:00:00' , '2011-09-01 00:00:00' , '1 month' ::interval) serie_t
    ;

DELETE FROM prikklok
WHERE random() <  0.001
    ;

-- a few indexes won't hurt
ALTER TABLE prikklok ADD PRIMARY KEY (person_id,dt_from)
    ;
CREATE UNIQUE INDEX ON prikklok (person_id,dt_to);

The resulting query plan looks perfect:

                                                                              QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=1389.73..1469.09 rows=1 width=20) (actual time=13.580..40.920 rows=16 loops=1)
   CTE ztree
     ->  Recursive Union  (cost=0.00..1389.73 rows=11 width=20) (actual time=0.136..27.405 rows=5004 loops=1)
           ->  Merge Anti Join  (cost=0.00..638.92 rows=1 width=20) (actual time=0.130..10.011 rows=16 loops=1)
                 Merge Cond: ((pr.person_id = px.person_id) AND (pr.dt_to = px.dt_from))
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok pr  (cost=0.00..291.31 rows=5004 width=20) (actual time=0.063..2.273 rows=5004 loops=1)
                 ->  Index Scan using prikklok_pkey on prikklok px  (cost=0.00..291.31 rows=5004 width=12) (actual time=0.012..2.204 rows=5004 loops=1)
           ->  Nested Loop  (cost=0.00..75.06 rows=1 width=20) (actual time=0.002..0.027 rows=10 loops=501)
                 ->  WorkTable Scan on ztree p2  (cost=0.00..0.20 rows=10 width=20) (actual time=0.000..0.001 rows=10 loops=501)
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p1  (cost=0.00..7.47 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=5004)
                       Index Cond: ((p1.person_id = p2.person_id) AND (p1.dt_to = p2.dt_from))
   ->  CTE Scan on ztree zt  (cost=0.00..0.22 rows=11 width=20) (actual time=0.138..29.887 rows=5004 loops=1)
   ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p3  (cost=0.00..7.18 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=5004)
         Index Cond: ((p3.person_id = zt.person_id) AND (p3.dt_to = zt.dt_from))
 Total runtime: 41.354 ms
(15 rows)

继续阅读：datetime gaps-and-islands postgresql sql

Get total time interval from multiple rows if sequence not broken

Structure

`Work` table

`Person` table

Data

`Work` table

`Person` table

Expected output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Structure

Work table

Person table

Data

Work table

Person table

Expected output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

`Work` table

`Person` table

`Work` table

`Person` table

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？