Different queries, same result (it seems), completly different performance... Why?
Currently I have two different queries that return exactly the same results however, changing the parameter from which the results are being filtered make them behave in a very different manner.
Results when searching for cartography
query #1: 22 rows / ~860ms;
SELECT eid FROM t_entidades
WHERE eid IN (
SELECT eid
FROM t_entidades
WHERE entidade_t LIKE '%cartography%'
)
OR eid IN (
SELECT entidade as eid
FROM t_entidade_actividade ea
LEFT JOIN t_actividades a ON a.aid = ea.actividade
WHERE a.actividade LIKE '%cartography%'
)
query #2: 22 rows / ~430ms;
SELECT eid FROM t_entidades WHERE entidade_t LIKE '%cartography%'
UNION
SELECT entidade as eid
FROM t_entidade_actividade ea
LEFT JOIN t_actividades a ON a.aid = ea.actividade
WHERE a.actividade LIKE '%cartography%'
Results when searching for cart
query #1: 715 rows / ~870ms;
query #2: 715 rows / ~450ms
Results when searching for car
query #1: never waited long enough... it seems it takes forever and over 1s would be too much
-- EXPLAIN OUTPUT:
"QUERY PLAN"
"Seq Scan on t_entidades (cost=44997.40..219177315.47 rows=500127 width=4)"
" Filter: ((SubPlan 1) OR (hashed SubPlan 2))"
" SubPlan 1"
" -> Materialize (cost=37712.46..38269.55 rows=40009 width=4)"
" -> Seq Scan on t_entidades (cost=0.00..37515.45 rows=40009 width=4)"
" Filter: ((entidade_t)::text ~~ '%car%'::text)"
" SubPlan 2"
" -> Hash Join (cost=36.48..7284.20 rows=298 width=4)"
" Hash Cond: (ea.actividade = a.aid)"
" -> Seq Scan on t_entidade_actividade ea (cost=0.00..5826.63 rows=378163 width=8)"
" -> Hash (cost=36.46..36.46 rows=1 width=4)"
" -> Seq Scan on t_actividades a (cost=0.00..36.46 rows=1 width=4)"
" Filter: ((actividade)::text ~~ '%car%'::text)"
query #2: 23661 rows / ~860ms
-- EXPLAIN OUTPUT:
"QUERY PLAN"
"HashAggregate (cost=45303.48..45706.55 rows=40307 width=4)"
" -> Append (cost=0.00..45202.72 rows=40307 width=4)"
" -> Seq Scan on t_entidades (cost=0.00..37515.45 rows=40009 width=4)"
" Filter: ((entidade_t)::text ~~ '%car%'::text)"
" -> Hash Join (cost=36.48..7284.20 rows=298 width=4)"
" Hash Cond: (ea.actividade = a.aid)"
" -> Seq Scan on t_entidade_actividade ea (cost=0.00..5826.63 rows=378163 width=8)"
" -> Hash (cost=36.46..36.46 rows=1 width=4)"
" -> Seq Scan on t_actividades a (cost=0.00..36.46 rows=1 width=4)"
" Filter: ((actividade)::text ~~ '%car%'::text)"
So, searching car
using query #1 seems to take forever... Which is funny considering that SELECT eid FROM t_en开发者_运维技巧tidades
takes only around 4s returning all 350k+ rows...
The only difference between EXPLAIN
s for query #1 at the different steps is that for car
the following row appears: " -> Materialize (cost=37712.46..38269.55 rows=40009 width=4)"
If someone would care to explain why query #1 takes so long to execute at the last example and exactly what's happening at each step of the explain, it would be very appreciated because I never seem to get it...
This is the first postgresql execution plan I see, but it looks like the first plan is doing a table scan on t_entidades and then for each row, it does all the stuff below, including to more table scans.
In the second plan it still does the two inner scans but hashaggregates the result.
So assuming you have 100 rows in you table the first plan does 201 table scans and the second does 2. Go figure :-)
The first query is so strange, it can only confuse the queryplanner. The first subquery should not be a subquery and the second subquery has a LEFT JOIN that should be an INNER JOIN, but could also be written without a subquery at all.
The second query also has a LEFT JOIN that is actualy an INNER JOIN, check the WHERE condition.
SELECT eid FROM t_entidades WHERE entidade_t LIKE '%cartography%'
UNION
SELECT
entidade as eid
FROM
t_entidade_actividade ea
INNER JOIN t_actividades a ON a.aid = ea.actividade
WHERE
a.actividade LIKE '%cartography%'
And do you have indexes on the columns aid
and actividade
?
You have joins that really are unnecessary. I've come to use the rule of thumb that if I'm not actually using a field as part of the returned set, I try to use EXISTS tests instead of JOINING. Something like:
SELECT te.[eid]
FROM [t_entidades] AS te
WHERE te.[entidade_t] LIKE '%cartography%'
OR EXISTS (
SELECT 1
FROM [t_entidade_actividade] AS ea
WHERE ea.[entidade] = te.[eid]
AND EXISTS (
SELECT 1
FROM [t_actividades] AS ta
WHERE ta.[aid] = ea.[actividade]
AND ta.[actividade] LIKE '%cartography%'
)
)
The plan for query#1 reads to me as:
- scan t_entidades, and for each row:
- execute subplan 1 by scanning a materialized subset (temp file?) from t_entidades
- execute subplan 2 by examining a hash table built from scanning t_entidade_actividade
An "explain analyze" would be able to tell you how often steps 1.1 and 1.2 were actually run for the query... if the scan in step 1.1 is being done for each row from step 1, then your query time will grow O(n^2) where n is the number of rows in t_entidades, and the temp space used for each iteration of 1.1 will increase as the number of matches in that table increases.
Your query 2 is much better written, IMHO. Each of the two sets of IDs are produced in quite different ways, so put them in separate queries and use a UNION to merge them together at the end. It also cuts out the useless outer scan of t_entidades in query 1 that just passes through IDs from the where clause. (Not that it's relevant to PostgreSQL, but it also makes it clear that the two scans could be run in parallel and then merged, but never mind).
t_entidade_actividade.actividade
might need an index?
精彩评论