Need help understanding the SQL explanation of a JOIN query versus a query with subselects

2023-02-04 01:03 问答作者：

I posted a previous question here asking about what was better, JOIN queries or queries using subselects. Link: Queries within queries: Is there a better way?

This is an extension to that question. Can somebody explain to me why I'm seeing what I'm seeing here?

Query (Subselects):

SELECT article_seq, title, synopsis, body, lastmodified_date, (SELECT type_id FROM types WHERE kbarticles.type = type_seq), status, scope, images, archived, author, owner, (SELECT owner_description FROM owners WHERE kbarticles.owner = owner_seq),  (SELECT review_date FROM kbreview WHERE kbarticles.article_seq = article_seq) FROM kbarticles WHERE article_seq = $1

Explain Analyze (Subselects)

QUERY PLAN

Index Scan using article_seq_pkey on kbarticles  (cost=0.00..32.24 rows=1 width=1241) (actual time=1.421..1.426 rows=1 loops=1)

  Index Cond: (article_seq = 1511)

  SubPlan

    ->  Seq Scan on kbreview  (cost=0.00..14.54 rows=1 width=8) (actual time=0.243..1.158 rows=1 loops=1)

          Filter: ($2 = article_seq)

    ->  Seq Scan on owners  (cost=0.00..1.16 rows=1 width=24) (actual time=0.073..0.078 rows=1 loops=1)

          Filter: ($1 = owner_seq)

    ->  Index Scan using types_type_seq_key on types  (cost=0.00..8.27 rows=1 width=24) (actual time=0.044..0.050 rows=1 loops=1)

          Index Cond: ($0 = type_seq)

Total runtime: 2.051 ms

Query (JOINs)

SELECT k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status, k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date FROM kbarticles k JOIN types t ON k.type = t.type_seq JOIN owners o ON k.owner = o.owner_seq JOIN kbreview r ON k.article_seq = r.article_seq WHERE k.article_seq = $1

Explain Analyze (JOINs)

QUERY PLAN

Nested Loop  (cost=0.00..32.39 rows=1 width=1293) (actual time=0.532..1.467 rows=1 loops=1)

  Join Filter: (k.owner = o.owner_seq)

  ->  Nested Loop  (cost=0.00..31.10 rows=1 width=1269) (actual time=0.419..1.345 rows=1 loops=1)

        ->  Nested Loop  (cost=0.00..22.82 rows=1 width=1249) (actual time=0.361..1.277 rows=1 loops=1)

              ->  Index Scan using article_seq_pkey on kbarticles k  (cost=0.00..8.27 rows=1 width=1241) (actual time=0.065..0.071 rows=1 loops=1)

                    Index Cond: (article_seq = 1511)

              ->  Seq Scan on kbreview r  (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)

                    Filter: (r.article_seq = 1511)

       开发者_如何学JAVA ->  Index Scan using types_type_seq_key on types t  (cost=0.00..8.27 rows=1 width=28) (actual time=0.048..0.055 rows=1 loops=1)

              Index Cond: (t.type_seq = k.type)

  ->  Seq Scan on owners o  (cost=0.00..1.13 rows=13 width=28) (actual time=0.022..0.038 rows=13 loops=1)

Total runtime: 2.256 ms

Based on the answers given (and accepted) in my previous question, JOINs should prove to have better results. However, in all my tests, I'm seeing JOINs to have worse results by a few milliseconds. It also seems like the JOINs are riddled with nested loops. All the tables I'm JOINing are indexed.

Am I doing something that I should be doing differently? Is there something I'm missing?

These queries are logically different.

The first one:

SELECT  article_seq, title, synopsis, body, lastmodified_date,
        (
        SELECT  type_id
        FROM    types
        WHERE   kbarticles.type = type_seq
        ),
        status, scope, images, archived, author, owner,
        (
        SELECT  owner_description
        FROM    owners
        WHERE   kbarticles.owner = owner_seq
        ),
        (
        SELECT  review_date
        FROM    kbreview
        WHERE   kbarticles.article_seq = article_seq
        )
FROM    kbarticles
WHERE   article_seq = $1

The second one:

SELECT  k.article_seq, k.title, k.synopsis, k.body, k.lastmodified_date, t.type_id, k.status,
        k.scope, k.images, k.archived, k.author, k.owner, o.owner_description, r.review_date
FROM    kbarticles k
JOIN    types t
ON      k.type = t.type_seq
JOIN    owners o
ON      k.owner = o.owner_seq
JOIN    kbreview r
ON      k.article_seq = r.article_seq
WHERE   k.article_seq = $1

If there is more than one record in types, owners or kbreview, the first query will fail while the second one will return duplicates from kbarticles.

If there is no types, owners or kbreviews for a kbarticle, the first query will return a NULL in appropriate field, while the second one will just omit that record.

If the *_seq fields seem to be the PRIMARY KEY fields, there will never be duplicates and the query will never fail; in the same way if kbarticles is constrained with FOREIGN KEY references to types, owners or kbreview, there can be no missing rows.

However, JOIN operators give the optimizer more place: it can make any table leading and use more advanced JOIN techniques like HASH JOIN or MERGE JOIN which are not available if you are using subqueries.

Is this table column indexed? r.article_seq

-> Seq Scan on kbreview r (cost=0.00..14.54 rows=1 width=12) (actual time=0.267..1.175 rows=1 loops=1)

This is where most time is spend.

Given that both plans are doing the same table scans, just arranged in a different way, I'd say there's no significant difference between the two. A "nested loop" where the lower arm produces a single row is pretty much the same as a single-row subselect.

Joins are more general, since using scalar subselects won't extend to getting two columns from any of those auxiliary tables, for example.

继续阅读：join postgresql query-optimization sql subquery

Need help understanding the SQL explanation of a JOIN query versus a query with subselects

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？