Very slow bitmap heap scan in Postgres

2023-01-21 13:33 问答作者：

I have the following simple table that contains traffic measurement data:

CREATE TABLE "TrafficData"
(
  "RoadID" character varying NOT NULL,
  "DateID" numeric NOT NULL,
  "ExactDateTime" timestamp NOT NULL,
  "CarsSpeed" numeric NOT NULL,
  "CarsCount" numeric NOT NULL
)
CREATE INDEX "RoadDate_Idx" ON "TrafficData" USING btree ("RoadID", "DateID");

The column RoadID uniquely identifies the road whose data is being recorded, while DateID identifies the day of the year (1..365) of the data - basically a rounded off representation of ExactDateTime.

I have about 100.000.000 rows; there are 1.000 distinct values in the column "RoadID" and 365 distinct values in the column "DateID".

I then run the following query:

SELECT * FROM "TrafficData"
WHERE "RoadID"='Station_1'
AND "DateID">20100610 AND "DateID"<20100618;

This takes up to three mind-boggling seconds to finish, and I cannot for the life of me figure out WHY.

EXPLAIN ANALYZE gives me the following output:

Bitmap Heap Scan on "TrafficData"  (cost=104.84..9743.06 rows=2496 width=47) (actual time=35.112..2162.404 rows=2016 loops=1)
  Recheck Cond: ((("RoadID")::text = 'S开发者_运维技巧tation_1'::text) AND ("DateID" > 20100610::numeric) AND ("DateID" < 20100618::numeric))
  ->  Bitmap Index Scan on "RoadDate_Idx"  (cost=0.00..104.22 rows=2496 width=0) (actual time=1.637..1.637 rows=2016 loops=1)
        Index Cond: ((("RoadID")::text = 'Station_1'::text) AND ("DateID" > 20100610::numeric) AND ("DateID" < 20100618::numeric))
Total runtime: 2163.985 ms

My specs:

Windows 7
Postgres 9.0
4GB RAM

I'd greatly appreciate any helpful pointers!

The slow part is obviosly fetching the data from the tables, since the index access seems to be very fast. You might either optimize your RAM usage parameters (see http://wiki.postgresql.org/wiki/Performance_Optimization and http://www.varlena.com/GeneralBits/Tidbits/perf.html), or optimize the layout of the data in the table by issuing a CLUSTER command (see http://www.postgresql.org/docs/8.3/static/sql-cluster.html).

CLUSTER "TrafficData" USING "RoadDate_Idx";

should do it.

Adding to Daniel's answer, the cluster operation is a one off process that rearranged the data on disk. The intent is to get your 2000 results rows from fewer disk blocks.

As this is dummy data, being used to find out how you can quickly query it, I'd recommend reloading it, in a pattern closer to how it will be loaded as it is generated. I imagine that the data is generated one day at a time, which will effectively result in strong correlation between DateID and the location on disk. If that is the case, then I'd either cluster by DateID, or split your test data into 365 separate loads, and reload it.

Without that, and having randomly generated data, you're most likely having to perform over 2000 seeks of your disk head.

I'd also check that anything else you're running on Windows 7 isn't adding time to those reads that you don't need, such as ensuring that the blocks read do not contain virus signatures, or concurrently performing an automatically scheduled disk defragmentation (resulting in the disk head hardly ever being anywhere close to where it was last time a database block was read).

4GB RAM -> 6+ you have 100M records, which isn't large but for a desktop machine memory could matter. If this is not a desktop, I'm not sure why you would have such a small amount of memory
AND "DateID">20100610 AND "DateID"<20100618; -> DateID BETWEEN 20100611 AND 20100617;
Create an index on the DateID
Get rid of all the double quotes around field names
Instead of a VarChar, make RoadID a text field

继续阅读：indexing performance postgresql

Very slow bitmap heap scan in Postgres

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？