Different queries, same result (it seems), completly different performance... Why?

2023-02-14 05:23 问答作者：

Currently I have two different queries that return exactly the same results however, changing the parameter from which the results are being filtered make them behave in a very different manner.

Results when searching for `cartography`

query #1: 22 rows / ~860ms;

SELECT eid FROM t_entidades 
WHERE  eid IN ( 
            SELECT     eid 
            FROM       t_entidades 
            WHERE      entidade_t LIKE '%cartography%'
)
OR     eid IN (
            SELECT    entidade as eid
            FROM      t_entidade_actividade ea
            LEFT JOIN t_actividades a ON a.aid = ea.actividade
            WHERE     a.actividade LIKE '%cartography%'
)

query #2: 22 rows / ~430ms;

SELECT      eid FROM t_entidades WHERE entidade_t LIKE '%cartography%'
UNION
SELECT      entidade as eid
FROM        t_entidade_actividade ea
LEFT JOIN   t_actividades a ON a.aid = ea.actividade
WHERE       a.actividade LIKE '%cartography%'

Results when searching for `cart`

query #1: 715 rows / ~870ms;

query #2: 715 rows / ~450ms

Results when searching for `car`

query #1: never waited long enough... it seems it takes forever and over 1s would be too much

-- EXPLAIN OUTPUT:
"QUERY PLAN"
"Seq Scan on t_entidades  (cost=44997.40..219177315.47 rows=500127 width=4)"
"  Filter: ((SubPlan 1) OR (hashed SubPlan 2))"
"  SubPlan 1"
"    ->  Materialize  (cost=37712.46..38269.55 rows=40009 width=4)"
"          ->  Seq Scan on t_entidades  (cost=0.00..37515.45 rows=40009 width=4)"
"                Filter: ((entidade_t)::text ~~ '%car%'::text)"
"  SubPlan 2"
"    ->  Hash Join  (cost=36.48..7284.20 rows=298 width=4)"
"          Hash Cond: (ea.actividade = a.aid)"
"          ->  Seq Scan on t_entidade_actividade ea  (cost=0.00..5826.63 rows=378163 width=8)"
"          ->  Hash  (cost=36.46..36.46 rows=1 width=4)"
"                ->  Seq Scan on t_actividades a  (cost=0.00..36.46 rows=1 width=4)"
"                      Filter: ((actividade)::text ~~ '%car%'::text)"

query #2: 23661 rows / ~860ms

-- EXPLAIN OUTPUT:
"QUERY PLAN"
"HashAggregate  (cost=45303.48..45706.55 rows=40307 width=4)"
"  ->  Append  (cost=0.00..45202.72 rows=40307 width=4)"
"        ->  Seq Scan on t_entidades  (cost=0.00..37515.45 rows=40009 width=4)"
"              Filter: ((entidade_t)::text ~~ '%car%'::text)"
"        ->  Hash Join  (cost=36.48..7284.20 rows=298 width=4)"
"              Hash Cond: (ea.actividade = a.aid)"
"              ->  Seq Scan on t_entidade_actividade ea  (cost=0.00..5826.63 rows=378163 width=8)"
"              ->  Hash  (cost=36.46..36.46 rows=1 width=4)"
"                    ->  Seq Scan on t_actividades a  (cost=0.00..36.46 rows=1 width=4)"
"                          Filter: ((actividade)::text ~~ '%car%'::text)"

So, searching car using query #1 seems to take forever... Which is funny considering that SELECT eid FROM t_en开发者_运维技巧tidades takes only around 4s returning all 350k+ rows...

The only difference between EXPLAINs for query #1 at the different steps is that for car the following row appears: " -> Materialize (cost=37712.46..38269.55 rows=40009 width=4)"

If someone would care to explain why query #1 takes so long to execute at the last example and exactly what's happening at each step of the explain, it would be very appreciated because I never seem to get it...

This is the first postgresql execution plan I see, but it looks like the first plan is doing a table scan on t_entidades and then for each row, it does all the stuff below, including to more table scans.

In the second plan it still does the two inner scans but hashaggregates the result.

So assuming you have 100 rows in you table the first plan does 201 table scans and the second does 2. Go figure :-)

The first query is so strange, it can only confuse the queryplanner. The first subquery should not be a subquery and the second subquery has a LEFT JOIN that should be an INNER JOIN, but could also be written without a subquery at all.

The second query also has a LEFT JOIN that is actualy an INNER JOIN, check the WHERE condition.

SELECT      eid FROM t_entidades WHERE entidade_t LIKE '%cartography%'
UNION
SELECT
  entidade as eid
FROM
  t_entidade_actividade ea
    INNER JOIN   t_actividades a ON a.aid = ea.actividade
WHERE       
  a.actividade LIKE '%cartography%'

And do you have indexes on the columns aid and actividade ?

You have joins that really are unnecessary. I've come to use the rule of thumb that if I'm not actually using a field as part of the returned set, I try to use EXISTS tests instead of JOINING. Something like:

SELECT  te.[eid]
FROM    [t_entidades] AS te
WHERE   te.[entidade_t] LIKE '%cartography%'
    OR EXISTS   (
                    SELECT  1
                    FROM    [t_entidade_actividade] AS ea
                    WHERE   ea.[entidade]       =   te.[eid]
                        AND EXISTS  (
                                        SELECT  1
                                        FROM    [t_actividades] AS ta
                                        WHERE   ta.[aid]        =       ea.[actividade]
                                            AND ta.[actividade] LIKE    '%cartography%'
                                    )
                )

The plan for query#1 reads to me as:

scan t_entidades, and for each row:
1. execute subplan 1 by scanning a materialized subset (temp file?) from t_entidades
2. execute subplan 2 by examining a hash table built from scanning t_entidade_actividade

An "explain analyze" would be able to tell you how often steps 1.1 and 1.2 were actually run for the query... if the scan in step 1.1 is being done for each row from step 1, then your query time will grow O(n^2) where n is the number of rows in t_entidades, and the temp space used for each iteration of 1.1 will increase as the number of matches in that table increases.

Your query 2 is much better written, IMHO. Each of the two sets of IDs are produced in quite different ways, so put them in separate queries and use a UNION to merge them together at the end. It also cuts out the useless outer scan of t_entidades in query 1 that just passes through IDs from the where clause. (Not that it's relevant to PostgreSQL, but it also makes it clear that the two scans could be run in parallel and then merged, but never mind).

t_entidade_actividade.actividade might need an index?

继续阅读：postgresql sql

Different queries, same result (it seems), completly different performance... Why?

Results when searching for `cartography`

Results when searching for `cart`

Results when searching for `car`

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Results when searching for cartography

Results when searching for cart

Results when searching for car

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Results when searching for `cartography`

Results when searching for `cart`

Results when searching for `car`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？