开发者

Postgres 9.0 slow query

I am evaluating postgres to migrate away from Oracle. The following query runs too slow, also please find the explain plan:

explain analyze select DISTINCT EVENT.ID, 
   ORIGIN.ID AS RIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREF开发者_如何学JAVAERRED_ORIGIN, 
   EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, 
   ORIGIN.DEPTH, ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
   ORIGIN.CONTRIBUTOR OCONTRIBUTOR, 
   MAGNITUDE.ID AS MAGID, MAGNITUDE.MAGNITUDE, MAGNITUDE.TYPE AS MAGTYPE
from event.event 
   left join event.origin on event.id = origin.eventid 
   left join event.magnitude on origin.id = event.magnitude.origin_id 
WHERE EXISTS(
    select origin_id from event.magnitude 
    where magnitude.magnitude>=7.2 
    and origin.id=origin_id) 
order by ORIGIN.TIME desc, MAGNITUDE.MAGNITUDE desc,
    EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID


"Unique  (cost=740549.86..741151.42 rows=15039 width=80) (actual time=17791.557..17799.092 rows=5517 loops=1)"
"  ->  Sort  (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.556..17792.220 rows=5517 loops=1)"
"        Sort Key: origin."time", event.magnitude.magnitude, event.id, event.preferred_origin_id, origin.id, event.contributor, origin.latitude, origin.longitude, origin.depth, origin.evtype, origin.catalog, origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
"        Sort Method:  quicksort  Memory: 968kB"
"        ->  Nested Loop Left Join  (cost=34642.50..739506.42 rows=15039 width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
"              ->  Hash Semi Join  (cost=34642.50..723750.23 rows=14382 width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
"                    Hash Cond: (origin.id = event.magnitude.origin_id)"
"                    ->  Merge Left Join  (cost=0.00..641544.72 rows=6133105 width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
"                          Merge Cond: (event.id = origin.eventid)"
"                          ->  Index Scan using event_key_index on event  (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616 rows=3276192 loops=1)"
"                          ->  Index Scan using origin_fk_index on origin  (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657 rows=6133105 loops=1)"
"                    ->  Hash  (cost=34462.73..34462.73 rows=14382 width=4) (actual time=6.668..6.668 rows=3198 loops=1)"
"                          Buckets: 2048  Batches: 1  Memory Usage: 113kB"
"                          ->  Bitmap Heap Scan on magnitude  (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414 rows=3198 loops=1)"
"                                Recheck Cond: (magnitude >= 7.2)"
"                                ->  Bitmap Index Scan on mag_index  (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198 loops=1)"
"                                      Index Cond: (magnitude >= 7.2)"
"              ->  Index Scan using mag_fkey_index on magnitude  (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2 loops=2246)"
"                    Index Cond: (origin.id = event.magnitude.origin_id)"
"Total runtime: 17799.669 ms"

This query runs in Oracle in 1 second while takes 16 seconds in postgres, The difference tells me that I am doing something wrong somewhere. This is a new installation on a local Mac machine with 12G of RAM.

I have:

effective_cache_size=4096MB   
shared_buffer=2048MB  
work_mem=100MB  


  1. Remove DISTINCT from your SELECT clause. It's redundant since you are selecting PRIMARY KEY columns anyway.

  2. Change the LEFT JOIN into the INNER JOIN. As others mentioned, your EXISTS clause eliminates all possible NULL outputs generated by the LEFT JOIN.

  3. Create two indexes:

    magnitude (magnitude, origin_id)
    magnitude (origin_id, magnitude)
    

    (each of them may be more efficient depending of how selective is your magnitude >= 7.2 conditions)

 

SELECT  EVENT.ID, 
        ORIGIN.ID AS RIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN, 
        EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE, 
        ORIGIN.DEPTH, ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
        ORIGIN.CONTRIBUTOR OCONTRIBUTOR, 
        MAGNITUDE.ID AS MAGID, MAGNITUDE.MAGNITUDE, MAGNITUDE.TYPE AS MAGTYPE
FROM    event.event
JOIN    event.origin
ON      origin.eventid = event.id
JOIN    event.magnitude
ON      magnitude.origin_id = origin.id
WHERE   origin.id IN
        (
        SELECT  origin_id
        FROM    magnitude
        WHERE   magnitude >= 7.2
        )
ORDER BY
        ORIGIN.TIME desc, MAGNITUDE.MAGNITUDE desc,
        EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID

A sample query to check:

WITH    event (id) AS
        (
        VALUES
                (1)
        ),
        origin (id, eventid) AS
        (
        VALUES
                (1, 1),
                (2, 1)
        ),
        magnitude (id, origin_id, magnitude) AS
        (
        VALUES
                (1, 1, 4),
                (2, 1, 8),
                (3, 3, 6)
        )
SELECT  *
FROM    event
LEFT JOIN
        origin
ON      origin.eventid = event.id
LEFT JOIN
        magnitude
ON      magnitude.origin_id = origin.id
WHERE   EXISTS
        (
        SELECT  origin_id
        FROM    magnitude 
        WHERE   magnitude.magnitude >= 7.2 
                AND origin.id = origin_id
        ) 

Both magnitudes, 4 and 8, are returned for origin 1.


As has already been answered in at least one of the other forums you posted this question in, change the " left join event.origin on event.id = origin.eventid " part to be an inner join, since your EXISTS query further down restricts it to that anyway.


There are useful links for learning how to interpret your own execution plan at the PostgreSQL wiki.


According to the plan:

Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.55 ...

The sort operator was the main bottleneck. Try removing the order by statement?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜