Is this the suitable scenario for Multi-column Indexes?

2023-04-08 00:07 问答作者：

My programming environment is Rails 2.3 and PostgreSQL 8 (shared Database on Heroku): I have re开发者_运维技巧ad this http://devcenter.heroku.com/articles/postgresql-indexes#multicolumn_indexes and other related resources on the Internet before I started building my app in the generic way:

My table has two columns A and B and are both indexed. (The rows are unique in terms of (A,B) pair) But after I built my app, I found that I only query the table with two types of call: myTable.find_by_A_and_B(a,b) and myTable.find_by_A(a)

We are expecting to have 10000+ entries in the table, the ratio of distinct A and distinct B is around 3:1. We expect that for each unique value in A, there would be more than 1000+ rows that have different value in B; and for each unique value in B, there would be no more than 300 rows that have different value in A.

My question is: Whether the current database setup (with two separate indexes) can be classified as "efficient" in respect to the myTable.find_by_A_and_B(a,b) call (as I have no idea on the inner working of PostgreSQL). And whether replacing the two indexes with just one multi-column indexes of (A,B) will provide significant speed up?

Thank you.

P.S. In response to the comment, here is a bit more information: According to this page, http://devcenter.heroku.com/articles/database It is running PostgreSQL 8.3

And the follow is the migration schema for myTable:

create_table :myTable do |t|
    t.string :b
    t.integer:a
    t.boolean :c, :default => false
end

add_index :mytable, :b 
add_index :mytable, :a

In recent versions of PostgreSQL multi-column indexes can be used efficiently to filter on just one of the columns. This works best on the first column, but reasonably well for the others, too.

Also, 10.000 rows is a piece of cake for PostgreSQL. Tables with millions of rows are not uncommon.

Assuming we talk about btree indexes (default) on integer (int4) columns ...
... the answer is: just use one multi-column index on (a,b).

Due to the page layout on disk (similar for tables and indexes), there is quite a bit of overhead per index row. Also, due to data alignment restrictions, one index (a,b) will use the exact same amount of disk space as an index on just (a) - on machines with MAXALIGN = 8 bytes (most 64-bit OS).
So, especially if you have lots of writes or limited disk space and/or RAM, your best bet is to just use one multi-column index on (a,b). Maintaining indexes on heavily written tables carries quite a cost, too.

Edit in response to the update on the question:

With a being integer, my answer is mostly valid. The index on (a,b) will be all or most of what you need.
Get rid of the separate index on b as you obviously don't have queries on just b.
As b is text, the multi-column index on (a,b) cannot profit from data-alignment as much as described above, but still. The greater the medium length of b, the more likely you will profit from an additional index on just a. With short b it probably doesn't pay. Else I would expect it to speed up myTable.find_by_A(a) by just a bit.
This will likely be faster then two separate indexes on a and b, but not by a huge margin, as Postgres can combine two indexes in a bitmap index scan. This has improved since v.8.3.
Be aware that btree indexes on text only help queries with '=' (more if you run on the C locale). Read the manual about operator classes.

You don't have to take my word, run some tests with EXPLAIN ANALYZE. It is very simple and informative and index creation for 10.000 rows is a matter of a second or so. Repeat each query a couple of times to populate the cache and get comparable results.

继续阅读：indexing multiple-columns postgresql ruby-on-rails

Is this the suitable scenario for Multi-column Indexes?

Edit in response to the update on the question:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Edit in response to the update on the question:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？