Postgres 9.0 slow query
I am evaluating postgres to migrate away from Oracle. The following query runs too slow, also please find the explain plan:
explain analyze select DISTINCT EVENT.ID,
ORIGIN.ID AS RIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREF开发者_如何学JAVAERRED_ORIGIN,
EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
ORIGIN.DEPTH, ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
MAGNITUDE.ID AS MAGID, MAGNITUDE.MAGNITUDE, MAGNITUDE.TYPE AS MAGTYPE
from event.event
left join event.origin on event.id = origin.eventid
left join event.magnitude on origin.id = event.magnitude.origin_id
WHERE EXISTS(
select origin_id from event.magnitude
where magnitude.magnitude>=7.2
and origin.id=origin_id)
order by ORIGIN.TIME desc, MAGNITUDE.MAGNITUDE desc,
EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
"Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual time=17791.557..17799.092 rows=5517 loops=1)"
" -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.556..17792.220 rows=5517 loops=1)"
" Sort Key: origin."time", event.magnitude.magnitude, event.id, event.preferred_origin_id, origin.id, event.contributor, origin.latitude, origin.longitude, origin.depth, origin.evtype, origin.catalog, origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
" Sort Method: quicksort Memory: 968kB"
" -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039 width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
" -> Hash Semi Join (cost=34642.50..723750.23 rows=14382 width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
" Hash Cond: (origin.id = event.magnitude.origin_id)"
" -> Merge Left Join (cost=0.00..641544.72 rows=6133105 width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
" Merge Cond: (event.id = origin.eventid)"
" -> Index Scan using event_key_index on event (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616 rows=3276192 loops=1)"
" -> Index Scan using origin_fk_index on origin (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657 rows=6133105 loops=1)"
" -> Hash (cost=34462.73..34462.73 rows=14382 width=4) (actual time=6.668..6.668 rows=3198 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 113kB"
" -> Bitmap Heap Scan on magnitude (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414 rows=3198 loops=1)"
" Recheck Cond: (magnitude >= 7.2)"
" -> Bitmap Index Scan on mag_index (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198 loops=1)"
" Index Cond: (magnitude >= 7.2)"
" -> Index Scan using mag_fkey_index on magnitude (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2 loops=2246)"
" Index Cond: (origin.id = event.magnitude.origin_id)"
"Total runtime: 17799.669 ms"
This query runs in Oracle in 1 second while takes 16 seconds in postgres, The difference tells me that I am doing something wrong somewhere. This is a new installation on a local Mac machine with 12G of RAM.
I have:
effective_cache_size=4096MB
shared_buffer=2048MB
work_mem=100MB
Remove
DISTINCT
from yourSELECT
clause. It's redundant since you are selectingPRIMARY KEY
columns anyway.Change the
LEFT JOIN
into theINNER JOIN
. As others mentioned, yourEXISTS
clause eliminates all possibleNULL
outputs generated by theLEFT JOIN
.Create two indexes:
magnitude (magnitude, origin_id) magnitude (origin_id, magnitude)
(each of them may be more efficient depending of how selective is your
magnitude >= 7.2
conditions)
SELECT EVENT.ID,
ORIGIN.ID AS RIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
ORIGIN.DEPTH, ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
MAGNITUDE.ID AS MAGID, MAGNITUDE.MAGNITUDE, MAGNITUDE.TYPE AS MAGTYPE
FROM event.event
JOIN event.origin
ON origin.eventid = event.id
JOIN event.magnitude
ON magnitude.origin_id = origin.id
WHERE origin.id IN
(
SELECT origin_id
FROM magnitude
WHERE magnitude >= 7.2
)
ORDER BY
ORIGIN.TIME desc, MAGNITUDE.MAGNITUDE desc,
EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
A sample query to check:
WITH event (id) AS
(
VALUES
(1)
),
origin (id, eventid) AS
(
VALUES
(1, 1),
(2, 1)
),
magnitude (id, origin_id, magnitude) AS
(
VALUES
(1, 1, 4),
(2, 1, 8),
(3, 3, 6)
)
SELECT *
FROM event
LEFT JOIN
origin
ON origin.eventid = event.id
LEFT JOIN
magnitude
ON magnitude.origin_id = origin.id
WHERE EXISTS
(
SELECT origin_id
FROM magnitude
WHERE magnitude.magnitude >= 7.2
AND origin.id = origin_id
)
Both magnitudes, 4
and 8
, are returned for origin 1
.
As has already been answered in at least one of the other forums you posted this question in, change the " left join event.origin on event.id = origin.eventid " part to be an inner join, since your EXISTS query further down restricts it to that anyway.
There are useful links for learning how to interpret your own execution plan at the PostgreSQL wiki.
According to the plan:
Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.55 ...
The sort operator was the main bottleneck. Try removing the order by
statement?
精彩评论