开发者

PostgreSql: different query plans with identical columns

I have a table with 2 foreign keys, lets call them fk1 and fk2. Both have identical types, and identical indices. But when I "explain" a simple select query, I get completely different query p开发者_如何学Clans. For fk1:

explain select * from mytable where fk1 = 1;

Result

Index Scan using fk1_idx on mytable  (cost=0.00..9.32 rows=2 width=4)
  Index Cond: (fk1 = 1)

For fk2

explain select * from mytable where fk2 = 1;

Result:

Bitmap Heap Scan on mytable  (cost=5.88..659.18 rows=208 width=4)
  Recheck Cond: (fk2 = 1)
  ->  Bitmap Index Scan on fk2_idx  (cost=0.00..5.83 rows=208 width=0)
        Index Cond: (fk2 = 1)

The second one seems to be more inefficient. Is it due to the fact that it potentially returns more results, and thus the more complex query pays off?


Yes, this comes down to what's called the "selectivity" of the predicate ("where ..." clause).

If the predicate is selecting only a small fraction of the rows in the table, it makes sense to fetch each one individually, with arbitrarily-ordered access to the table data, since only a few pages will be fetched.

As the number of rows to be selected increases, a bitmap scan becomes more appropriate: the index is used to determine which pages within the table are "interesting", and then those pages are scanned in the order they are placed in the table data file. This has the advantage that adjacent pages can be requested together, which is likely to be much more efficiently served by the filesystem/disk system. (Of course, this depends on the table data file being reasonably unfragmented). Since the set of actual interesting tuples within each page is not kept, just the set of pages themselves, the predicate has to be re-evaluated for each tuple in the retrieved pages: hence the "recheck cond" in the query. (One advantage of this strategy is that it allows lookups in multiple independent indices to be combined, by simply ANDing (or ORing) together the "interesting pages" bitmaps from several index lookups)

As the number of rows to be selected grows still further, the advantage of scanning the index dwindles as the likely result is seen to be that most of the table will be marked as "interesting". So eventually a plain sequential scan becomes appropriate: all the pages are walked through in order, and the index is ignored.

IIRC this tends to work out that requesting less than 15% of the table is likely an index scan, 15-50% a bitmap scan, 50%+ a seq scan. Very roughly. This is much affected by the relative settings of random_page_cost and seq_page_cost, amongst others (e.g. effective_cache_size).

Postgresql collects statistics about common values and their frequencies, and histograms of other values for each column in the database- this is used to estimate selectivity and populates the "rows" estimate you seen in EXPLAIN output. The documentation contains a description of how this is done: http://www.postgresql.org/docs/current/static/planner-stats-details.html


Yes, the query plan will be based on statistical data about the contents in the tables. Remember to run "vacuum analyze;" now and then to keep the statistics data up to data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜