开发者

How to select overlapping value pairs from DB

In my postgres db I have a table with 3 columns like this:

start | end  | sorce
17    | 23   | A
150   | 188  | A
200   | 260  | A
19    | 30   | B
105   | 149  | B
199   | 220  | B
...

I would like select all row where the regions (start to end) from the differed sources A and B are overlapping.

UPDATE:

From postgres version 8.4 it is posible to solve the problem with window functions. It is开发者_开发问答 much more faster than the join or subselect methodes. Link to postgres wiki.


This works as a brute-force approach (I renamed your columns range_start and range_end to avoid conflict with the reserved word "end"):

select *
from t cross join t t2
where t2.source <> t.source
      and box(point(t2.range_start,t2.range_start),point(t2.range_end,t2.range_end))
            && box(point(t.range_start,t.range_start),point(t.range_end,t.range_end))

or

select *
from t
where exists (select 1 from t t2
              where t2.source <> t.source and box(point(t2.range_start,t2.range_start),point(t2.range_end,t2.range_end))
                  && box(point(t.range_start,t.range_start),point(t.range_end,t.range_end)))

You should then be able to use a GiST index which may make this a more efficient (seq scan + index scan):

create index t_range_idx on t using gist (box(point(range_start,range_start),point(range_end,range_end))

This function might aid understanding by clearing up the SQL:

create function range(not_before int, not_after int) returns box
   strict immutable language sql
   as $$ select box(point($1,$1),point($2,$2)) $$;

With this, you can write:

select * from t where range(range_start,range_end) && range(10,20);

and note that the box && box operator means "overlaps".


If you want all pairings, then INNER JOIN A and B using the classic overlap test

a.start < b.end and b.start < a.end

i.e

select a.start a_start, a.end a_end, b.start b_start, b.end b_end
from tbl a
inner join tbl b on a.start < b.end and b.start < a.end and b.source = 'B'
where a.source = 'A'

If you do not mean sources literally 'A' and 'B', just that they are different, you can use the below instead

select a.start a_start, a.end a_end, b.start b_start, b.end b_end
from tbl a
inner join tbl b on a.start < b.end and b.start < a.end and a.source <> b.source

Depending on your definition of overlapping, swap the < with <= (both times)

  • < : 10-20 does not overlap 20-30
  • <= : 10-20 does overlap 20-30
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜