How to select overlapping value pairs from DB
In my postgres db I have a table with 3 columns like this:
start | end | sorce
17 | 23 | A
150 | 188 | A
200 | 260 | A
19 | 30 | B
105 | 149 | B
199 | 220 | B
...
I would like select all row where the regions (start to end) from the differed sources A and B are overlapping.
UPDATE:
From postgres version 8.4 it is posible to solve the problem with window functions. It is开发者_开发问答 much more faster than the join or subselect methodes. Link to postgres wiki.
This works as a brute-force approach (I renamed your columns range_start and range_end to avoid conflict with the reserved word "end"):
select *
from t cross join t t2
where t2.source <> t.source
and box(point(t2.range_start,t2.range_start),point(t2.range_end,t2.range_end))
&& box(point(t.range_start,t.range_start),point(t.range_end,t.range_end))
or
select *
from t
where exists (select 1 from t t2
where t2.source <> t.source and box(point(t2.range_start,t2.range_start),point(t2.range_end,t2.range_end))
&& box(point(t.range_start,t.range_start),point(t.range_end,t.range_end)))
You should then be able to use a GiST index which may make this a more efficient (seq scan + index scan):
create index t_range_idx on t using gist (box(point(range_start,range_start),point(range_end,range_end))
This function might aid understanding by clearing up the SQL:
create function range(not_before int, not_after int) returns box
strict immutable language sql
as $$ select box(point($1,$1),point($2,$2)) $$;
With this, you can write:
select * from t where range(range_start,range_end) && range(10,20);
and note that the box && box
operator means "overlaps".
If you want all pairings, then INNER JOIN A and B using the classic overlap test
a.start < b.end and b.start < a.end
i.e
select a.start a_start, a.end a_end, b.start b_start, b.end b_end
from tbl a
inner join tbl b on a.start < b.end and b.start < a.end and b.source = 'B'
where a.source = 'A'
If you do not mean sources literally
'A' and 'B', just that they are different, you can use the below instead
select a.start a_start, a.end a_end, b.start b_start, b.end b_end
from tbl a
inner join tbl b on a.start < b.end and b.start < a.end and a.source <> b.source
Depending on your definition of overlapping, swap the <
with <=
(both times)
<
: 10-20 does not overlap 20-30<=
: 10-20 does overlap 20-30
精彩评论