Get total time interval from multiple rows if sequence not broken
I have Work
and Person
tables (these are just examples to understand problem).
Structure
Work
table
id INTEGER
person_id INTEGER
dt_from DATETIME
dt_to DATETIME
Person
table
person_id INTEGER
name VARCHAR(50)
Data
Work
table
id | person_id | dt_from | dt_to
-------------------------------------------------
1 | 1 | 2011-01-01 | 2011-02-02
2 | 1 | 2011-02-02 | 2011-04-04
3 | 1 | 2011-06-06 | 2011-09-09
4 | 2 | 2011-01-01 | 2011-02-02
5 | 2 | 2011-02-02 | 2011-03-03
....etc.
Person
table
Just person names with person id
Expected output
Person 1 : 2011-01-01 - 2011-04-04
Person 1 : 2011-06-06 - 2011-09-09
Person 2 : 2011-01-01 - 2011-03-03
Interval must be in seque开发者_JS百科nce. It cannot be broken somewhere in middle. Thats why Person 1 have two intervals.
I'm using postgres if it changes something. Have you any thougths? I wanted do it in one query, but if there is no such solution i will do some interval merge in php.
There may be a way to do this in one SQL select, but it escapes me. I managed to do it with one stored function though. Here's what I did for my testing:
create table work
(id integer, start_date date, end_date date);
insert into work values (1, '2011-01-01','2011-02-02');
insert into work values (1, '2011-02-02','2011-04-04');
insert into work values (1, '2011-06-06','2011-09-09');
insert into work values (2, '2011-01-01','2011-02-02');
insert into work values (2, '2011-02-02','2011-03-03');
create or replace function get_data() returns setof work as
$body$
declare
res work%rowtype;
sd date := null;
begin
for res in
select
w1.id,
case when exists (select 1 from work w2 where w1.id=w2.id and w2.end_date=w1.start_date) then null else w1.start_date end,
case when exists (select 1 from work w2 where w1.id=w2.id and w2.start_date=w1.end_date) then null else w1.end_date end
from
work w1
order by
id, start_date, end_date
loop
if res.start_date is not null and res.end_date is not null then
return next res;
elsif res.start_date is not null then
sd := res.start_date;
elsif res.end_date is not null then
res.start_date := sd;
return next res;
end if;
end loop;
return;
end;$body$
language 'plpgsql';
Then
select * from get_data() order by id, start_date;
returned this result:
id | start_date | end_date
----+------------+------------
1 | 2011-01-01 | 2011-04-04
1 | 2011-06-06 | 2011-09-09
2 | 2011-01-01 | 2011-03-03
(3 rows)
which is, I think, what you're after.
You could try postgres's WITH RECURSIVE construct. (after all, a linked list is a kind of tree) Getting the boundary conditions right will be a problem, but at least it would solve the problem without the need for loops.
UPDATE: Added code. The problem with RECURSIVE is that you can only specify only the "tail" boundary condition. To specify the "head" condition, you need to wrap it into a view.
CREATE VIEW collected_time AS (
WITH RECURSIVE ztree(person_id, dt_from, dt_to) AS (
-- Terminal part
SELECT pr.person_id, pr.dt_from, pr.dt_to
FROM prikklok pr
WHERE NOT EXISTS (
SELECT * FROM prikklok px
WHERE px.person_id = pr.person_id AND px.dt_from = pr.dt_to
)
UNION
-- Recursive part
SELECT p1.person_id AS person_id
, p1.dt_from AS dt_from
, p2.dt_to AS dt_to
FROM prikklok AS p1
, ztree AS p2
WHERE p1.person_id = p2.person_id
AND p1.dt_to = p2.dt_from
)
SELECT *
FROM ztree zt
WHERE NOT EXISTS (select *
FROM prikklok p3
WHERE p3.person_id = zt.person_id
AND p3.dt_to = zt.dt_from
)
);
SELECT * FROM collected_time;
-- now generate some data with gaps
INSERT INTO prikklok
SELECT serie_n
, serie_t
, serie_t + '1 month'::interval
FROM generate_series (1,10) serie_n
, generate_series ( '1970-01-01 00:00:00' , '2011-09-01 00:00:00' , '1 month' ::interval) serie_t
;
DELETE FROM prikklok
WHERE random() < 0.001
;
-- a few indexes won't hurt
ALTER TABLE prikklok ADD PRIMARY KEY (person_id,dt_from)
;
CREATE UNIQUE INDEX ON prikklok (person_id,dt_to);
The resulting query plan looks perfect:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=1389.73..1469.09 rows=1 width=20) (actual time=13.580..40.920 rows=16 loops=1)
CTE ztree
-> Recursive Union (cost=0.00..1389.73 rows=11 width=20) (actual time=0.136..27.405 rows=5004 loops=1)
-> Merge Anti Join (cost=0.00..638.92 rows=1 width=20) (actual time=0.130..10.011 rows=16 loops=1)
Merge Cond: ((pr.person_id = px.person_id) AND (pr.dt_to = px.dt_from))
-> Index Scan using prikklok_person_id_dt_to_idx on prikklok pr (cost=0.00..291.31 rows=5004 width=20) (actual time=0.063..2.273 rows=5004 loops=1)
-> Index Scan using prikklok_pkey on prikklok px (cost=0.00..291.31 rows=5004 width=12) (actual time=0.012..2.204 rows=5004 loops=1)
-> Nested Loop (cost=0.00..75.06 rows=1 width=20) (actual time=0.002..0.027 rows=10 loops=501)
-> WorkTable Scan on ztree p2 (cost=0.00..0.20 rows=10 width=20) (actual time=0.000..0.001 rows=10 loops=501)
-> Index Scan using prikklok_person_id_dt_to_idx on prikklok p1 (cost=0.00..7.47 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=5004)
Index Cond: ((p1.person_id = p2.person_id) AND (p1.dt_to = p2.dt_from))
-> CTE Scan on ztree zt (cost=0.00..0.22 rows=11 width=20) (actual time=0.138..29.887 rows=5004 loops=1)
-> Index Scan using prikklok_person_id_dt_to_idx on prikklok p3 (cost=0.00..7.18 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=5004)
Index Cond: ((p3.person_id = zt.person_id) AND (p3.dt_to = zt.dt_from))
Total runtime: 41.354 ms
(15 rows)
精彩评论