Tuning subquery in postgres

2023-03-06 06:14 问答作者：

I have discov开发者_开发知识库ered some suspect data in a database. I am attempting to determine if a certain field, lastname, is correct. I have come up with the following query in postgres:

SELECT members."memberID", 
       members.lastname 
  FROM members 
 WHERE members."memberID" NOT IN (SELECT members."memberID" 
                                    FROM members 
                                   WHERE members.lastname ~* '[a-zA-z]+([-][a-zA-Z]+)*');

The subquery currently matches against normal names and names with a hypen. The parent query should display the members who don't match that pattern. Currently the query takes an incredible amount of time to run (i've never seen it complete). I am not sure why it takes so long or how to improve it.

NOT EXISTS

SELECT m."memberID", 
       m.lastname 
  FROM MEMBERS m 
 WHERE NOT EXISTS (SELECT NULL
                     FROM MEMBERS b
                    WHERE b.lastname ~* '[a-zA-z]+([-][a-zA-Z]+)*'
                      AND b."memberID" = m."memberID");

LEFT JOIN / IS NULL

   SELECT m."memberID", 
          m.lastname 
     FROM MEMBERS m 
LEFT JOIN MEMBERS b ON b."memberID" = m."memberID"
                   AND b.lastname ~* '[a-zA-z]+([-][a-zA-Z]+)*'
    WHERE b."memberID" IS NULL

Summary

Quote:

PostgreSQL treats LEFT JOIN and NOT EXISTS equally, using same execution plan for both of them (namely a Hash Anti Join for the example above).

As for NOT IN, which is semantically different since its logic is trivalent and it can return NULL, PostgreSQL tries to take this into account and limits itself to using a filter against a subplan (a hashed subplan for a hashable resultset like in example above).

Since it need to search the hash table for each missing value twice (first time to find the value, second time to find a NULL), this method is a little less efficient.

A plain subplan, which the optimizer can resort to any time it decides the list will not fit into the memory, is very inefficient and the queries that have possibility of using it should be avoided like a plague.

That’s why in PostgreSQL 8.4 one should always use LEFT JOIN / IS NULL or NOT EXISTS rather than NOT IN to find the missing values.

Addendum

But as Andrew Lazarus points out, if there are no duplicates of memberid in the MEMBERS table, the query only needs to be:

SELECT m."memberID", 
       m.lastname 
  FROM MEMBERS m 
 WHERE b.lastname ~* '[a-zA-z]+([-][a-zA-Z]+)*'

I like OMG Ponies answer, but if memberID is unique (i.e., PK), you can just drop the subquery altogether.

SELECT members."memberID", 
       members.lastname 
  FROM members 
 WHERE members.lastname !~ '[a-zA-Z]+([-][a-zA-Z]+)*';

(I deleted the case-insensitive operator since the regexp covers both cases.)

继续阅读：postgresql sql subquery

Tuning subquery in postgres

NOT EXISTS

LEFT JOIN / IS NULL

Summary

Addendum

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

NOT EXISTS

LEFT JOIN / IS NULL

Summary

Addendum

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？