Indexing/Performance strategies for vast amount of the same value
Base information: This is in context to the indexing process of OpenStreetMap data. To simplify the ques开发者_高级运维tion: the core information is divided into 3 main types with value "W", "R", "N" (VARCHAR(1)
).
The table has somewhere around ~75M rows, all columns with "W" make up ~42M rows. Existing indexes are not relevant to this question.
Now the question itself: The indexing of the data is done via an procedure. Inside this procedure, there are some loops that do the following:
[...] SELECT * FROM table WHERE the_key = "W"; [...]
The results get looped again and the above query itself is also in a loop. This takes a lot of time and slows down the process massivly. An indexon the_key
is obviously useless since all the values that the index might use are the same ("W"). The script itself is running with a speed that is OK, only the SELECT
ing takes very long.
Do I
- need to create a "special" kind of index that takes this into account and makes the
SELECT
quicker? If so, which one? - need to tune some of the server parameters (they are already tuned and the result that they deliver seem to be good. If needed, I can post them)?
- have to live with the speed and simply get more hardware to gain more power (Tim Taylor grunt grunt)?
Any alternatives to the above points (except rewriting it or not using it)?
This query could use an index if you set work_mem
high enough to enable a bitmap index scan. It's quite possible, however, that the optimizer still won't choose to use it. Altogether, there isn't much to optimize about this. It looks like the surrounding looping code is in need of improvement.
First you say:
The table has somewhere around ~75M rows, all columns with "W" make up ~42M rows.
Then you say that you do
SELECT * FROM table WHERE the_key = "W";
several times in a loop and expect it to perform? It is impossible - no indexing would speed up this query - it has to return 42M rows - more than half. If you refuse to rewrite this indexing procedure to avoid querying this more than once then it is just The Daily WTF worthy.
精彩评论