Determine distance between boxes

2023-03-23 01:18 问答作者：

This problem is causing me a headache for a while. I've a PostgreSQL 8.4 database that consists of only one table containing more than 4.000.000 records. This table is structured as follows:

CREATE TABLE metadata (
  id serial NOT NULL,
  "value" text NOT NULL DEFAULT ''::text,
  segment_box box NOT NULL DEFAULT box(point(((-9223372036854775808)::bigint)::double precision, (1)::double precision), point((9223372036854775807::bigint)::double precision, ((-1))::double precision)),
  CONSTRAINT metadata_pk PRIMARY KEY (id)
)

CREATE INDEX metadata_segment_box_ix
  ON metadata
  USING gist
  (segment_box);

CREATE INDEX metadata_tag_value_ix
  ON metadata
  USING btree
  (value);

The table contains segments (in time), represented as rectangular boxes. These segments are annotated using the "value" column.

The kind of queries I would like to perform on the database try to find all segments with a specified value that is contained within a certain window. A query that successfully achieves this is:

SELECT * FROM (SELECT * FROM metadata WHERE value='X') a, 
(SELECT * FROM metadata WHERE AND value='Y') b 
WHERE a.segment_box <-> b.segment_box <= 3000

But, as you probably noticed, this query cannot be performed efficiently by the database. The cartesian product of sub-queries a and b is becoming really large. Is there a way to perform these queries more efficiently? I can imagine some sort of sliding window approach would do the trick. Maybe something like the following:

SELECT *, rank() OVER (
PARTITION BY "value" ORDER BY (segment_box[1])[0], (segment_box[0])[0]
) FROM metadata WHERE value='X' OR value='Y'

Update: One of the things I tried after posting this question is creating a custom function in Postgres. I tried:

CREATE OR REPLACE FUNCTION within_window(size bigint DEFAULT 0)
  RETURNS setof metadata AS
$BODY$DECLARE
  segment RECORD;
  neighbour RECORD;
  newwindow box;
BEGIN
  FOR segment IN (
    SELECT * FROM metadata WHERE value='X' OR value='Y' 
      ORDER BY (segment_box[1])[0], (segment_box[0])[0]
  ) LOOP
    newwindow := box(segment.segment_box[0], 
      point((((segment.segment_box[1])[0]) + size), (segment.segment_box[1])[1]));
    FOR neighbour IN (
      SELECT DISTINCT ON (metadata_id) * FROM metadata WHERE value='X' OR value='Y') 
      开发者_开发知识库  AND segment_box &< newwindow
        AND segment_box &> newwindow 
    ) LOOP
      RETURN NEXT neighbour;
    END LOOP;
  END LOOP;
END;$BODY$
  LANGUAGE plpgsql;

However, this function is as slow as the basic solution I described above because of the subquery that must be performed many times. Any other thoughts on this??

I solved the problem myself with a kind of sweep line algorithm. Only one query is performed. I use a cursor to sweep back and forth over the query's resultset. The resulting algorithm works as follows:

CREATE OR REPLACE FUNCTION within_window(size bigint DEFAULT 0)
  RETURNS setof metadata AS
$BODY$DECLARE 
crsr SCROLL CURSOR FOR (SELECT * FROM metadata WHERE value='X' OR value='Y' ORDER BY (segment_box[1])[0], (segment_box[0])[0]);
rc RECORD;
rcc RECORD;
crsr_position int;
last_crsr int;
BEGIN
    OPEN crsr;
    crsr_position := 0;
    LOOP FETCH NEXT FROM crsr INTO rc;
        IF NOT FOUND THEN
            EXIT;
        END IF;
        last_crsr := crsr_position;
        LOOP FETCH NEXT FROM crsr INTO rcc;
            IF NOT FOUND THEN
                EXIT;
            ELSEIF 
                rcc.segment_box &< box(rc.segment_box[0], point((((rc.segment_box[1])[0]) + size), (rc.segment_box[1])[1])) AND
                rcc.segment_box &> box(rc.segment_box[0], point((((rc.segment_box[1])[0]) + size), (rc.segment_box[1])[1]))
            THEN
                RETURN NEXT rcc;
            ELSE 
                EXIT;
            END IF;
        END LOOP;
        crsr_position := last_crsr + 1;
        MOVE ABSOLUTE crsr_position FROM crsr;
    END LOOP;
    CLOSE crsr;
END;$BODY$
  LANGUAGE plpgsql;

Using this function the query only needs 476 ms instead of 6+ minutes (on the 4+ million row database)!

继续阅读：algorithm database postgis postgresql sql

Determine distance between boxes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？