PostgreSql: different query plans with identical columns

2023-01-05 14:39 问答作者：

I have a table with 2 foreign keys, lets call them fk1 and fk2. Both have identical types, and identical indices. But when I "explain" a simple select query, I get completely different query p开发者_如何学Clans. For fk1:

explain select * from mytable where fk1 = 1;

Result

Index Scan using fk1_idx on mytable  (cost=0.00..9.32 rows=2 width=4)
  Index Cond: (fk1 = 1)

For fk2

explain select * from mytable where fk2 = 1;

Result:

Bitmap Heap Scan on mytable  (cost=5.88..659.18 rows=208 width=4)
  Recheck Cond: (fk2 = 1)
  ->  Bitmap Index Scan on fk2_idx  (cost=0.00..5.83 rows=208 width=0)
        Index Cond: (fk2 = 1)

The second one seems to be more inefficient. Is it due to the fact that it potentially returns more results, and thus the more complex query pays off?

Yes, this comes down to what's called the "selectivity" of the predicate ("where ..." clause).

If the predicate is selecting only a small fraction of the rows in the table, it makes sense to fetch each one individually, with arbitrarily-ordered access to the table data, since only a few pages will be fetched.

As the number of rows to be selected increases, a bitmap scan becomes more appropriate: the index is used to determine which pages within the table are "interesting", and then those pages are scanned in the order they are placed in the table data file. This has the advantage that adjacent pages can be requested together, which is likely to be much more efficiently served by the filesystem/disk system. (Of course, this depends on the table data file being reasonably unfragmented). Since the set of actual interesting tuples within each page is not kept, just the set of pages themselves, the predicate has to be re-evaluated for each tuple in the retrieved pages: hence the "recheck cond" in the query. (One advantage of this strategy is that it allows lookups in multiple independent indices to be combined, by simply ANDing (or ORing) together the "interesting pages" bitmaps from several index lookups)

As the number of rows to be selected grows still further, the advantage of scanning the index dwindles as the likely result is seen to be that most of the table will be marked as "interesting". So eventually a plain sequential scan becomes appropriate: all the pages are walked through in order, and the index is ignored.

IIRC this tends to work out that requesting less than 15% of the table is likely an index scan, 15-50% a bitmap scan, 50%+ a seq scan. Very roughly. This is much affected by the relative settings of random_page_cost and seq_page_cost, amongst others (e.g. effective_cache_size).

Postgresql collects statistics about common values and their frequencies, and histograms of other values for each column in the database- this is used to estimate selectivity and populates the "rows" estimate you seen in EXPLAIN output. The documentation contains a description of how this is done: http://www.postgresql.org/docs/current/static/planner-stats-details.html

Yes, the query plan will be based on statistical data about the contents in the tables. Remember to run "vacuum analyze;" now and then to keep the statistics data up to data.

继续阅读：postgresql sql-execution-plan

PostgreSql: different query plans with identical columns

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

性激素六项检查的最佳时间是多久？多少钱？？

抽烟只抽炫赫门？

Infinite gtk warnings when I right click on the icon