Why does the following Postgres SQL query take so long?

2023-03-09 00:50 问答作者：

The raw query is as follows

SELECT "TIME", "TRADEPRICE"
  FROM "YEAR" where "DATE"='2010-03-01' 
  and "SECURITY"='STW.AX' 
  AND "TIME" < '10:16:00' 
  AND "TYPE" = 'TRADE'
  ORDER BY "TIME" ASC LIMIT 3

I have built three indices as follows

Columns "DATE" DESC NULLS LAST
Columns "SECURITY" DESC NULLS LAST
Columns "TIME" DESC NULLS LAST

I do not index TYPE because it only takes one of two possible values

Explain analyze produces the following

"Limit  (cost=50291.28..50291.28 rows=3 width=16) (actual time=1794484.566..1794484.567 rows=3 loops=1)"
"  ->  Sort  (cost=50291.28..50291.29 rows=4 width=16) (actual time=1794484.562..1794484.563 rows=3 loops=1)"
"        Sort Key: "TIME""
"        Sort Method:  top-N heapsort  Memory: 25kB"
"        ->  Bitmap Heap Scan on "YEAR"  (cost=48569.54..50291.24 rows=4 width=16) (actual time=1794411.662..1794484.498 rows=20 loops=1)"
"              Recheck Cond: (("SECURITY" = 'STW.AX'::bpchar) AND ("DATE" = '2010-03-01'::date))"
"              Filter: (("TIME" < '10:16:00'::time without time zone) AND ("TYPE" = 'TRADE'::bpchar))"
"              ->  BitmapAnd  (cost=48569.54..48569.54 rows=430 width=0) (actual time=1794411.249..1794411.249 rows=0 loops=1)"
"                    ->  Bitmap Index Scan on security_desc  (cost=0.00..4722.94 rows=166029 width=0) (actual time=1793917.506..1793917.506 rows=1291933 loops=1)"
"                          Index Cond: ("SECURITY" = 'STW.AX'::bpchar)"
"                    ->  Bitmap Index Scan on date_desc  (cost=0.00..43846.35 rows=2368764 width=0) (actual time=378.698..378.698 rows=2317130 loops=1)"
"                          Index Cond: ("DATE" = '2010-03-01'::date)"
"Total runtime: 1794485.224 ms"

The database is about 1 billion rows running on Core2Qu开发者_如何学Goad with 8gig RAM on Ubuntu 64bit. Surely this query should not take half an hour

The database is about 1 billion rows running on Core2Quad with 8gig RAM on Ubuntu 64bit. Surely this query should not take half an hour

It's taking half an hour because of the way you set up your indexes.

Your query has no multi-column indexes that it can use to head straight to the needed rows. It does the next best thing, which is a bitmap index scan on barely selective indexes, and top-3 sorting the resulting set.

The two indexes in question, on the security and on the date, yield 1.3M and 2.3M rows respectively. Combining them will be excruciatingly slow because you're randomly looking up over a million rows and filtering each one.

Adding insult to injury, your data structure is such that two highly correlated fields (date and time) are stored and manipulated separately. This confuses the query planner because Postgres does not collect correlation data. Your queries will thus almost always resort to filtering through enormous sets of data, and ordering the filtered set on a separate criteria.

I'd suggest the following changes:

Alter the table and add a datetime column, of type timestamp with time zone. Combine your date and time columns into it.
Drop the date and time fields accordingly, as well as the indexes on them. Also drop the index on security.
Create an index on (security, datetime). (And don't mess around with nulls first/nulls last unless your ordering criteria contain those clauses too.)
At your option, add a separate index on (datetime) or on (datetime, security), if you ever need to execute queries that do stats on all trades in a day or date range.
vacuum analyze the whole mess once you're done with the above.

You'll then be able to rewrite your query like so:

SELECT "TIME", "TRADEPRICE"
  FROM "YEAR"
  WHERE '2010-03-01 00:00:00' <= "DATETIME" AND "DATETIME" < '2010-03-01 10:16:00'
  AND "SECURITY"='STW.AX' 
  AND "TYPE" = 'TRADE'
  ORDER BY "DATETIME" ASC LIMIT 3

This will yield the most optimial plan: retrieving the top-3 rows from a filtered index scan on (security, datetime), which I'd expect (since you've a billion rows) will take 25ms at most.

Add a composite index of many of the search terms together, eg ON YEAR (TYPE, SECURITY, DATE, TIME). Then the database can look up in a single index to match all of them, instead of having to search multiple indexes and collate all the results together (Bitmap Index Scan).

Exactly which columns (eg include TYPE or not?) and what order you include them in the index depends on data characteristics and what kind of other queries you're doing (since you get to re-use any left-subset of a composite index for free), so experiment a bit; but to encourage order optimisation keep the ORDER BY column as the last used index column/direction.

You may also want to ANALYZE to update stats for the query planner as some of the number-of-rows guesses seem to be a bit off.

I do not index TYPE because it only takes one of two possible values

You have to understand how indexes work, and why they work. Indexes duplicate the indexed data into lean, small index blocks that only contain the specified index data. From your X GB of raw data only X/20 (guesstimate) size remains. If you specify a query that uses data which is not indexed means that for each of the records that satisfy the other query criteria, the DBMS has to read the corresponding raw data block to the index block to determine whether it matches the query criteria.

The optimal case is that there is at least one index that contains every single requirement the query states so there is no need to do lookups into the data blocks.

Another hint: It is usually a good idea to list columns that take values that are usually queried as a range (in your case "TIME") last.

My suggestion: Drop all indexes. Create an index with the fields TIME(ASC),DATE,SECURITY,TYPE (in this order). Use the query

SELECT "TIME", "TRADEPRICE"
  FROM "YEAR"
  WHERE "TIME" < '10:16:00'
  AND "DATE"='2010-03-01' 
  AND "SECURITY"='STW.AX' 
  AND "TYPE" = 'TRADE'

And watch the incredible speed.

继续阅读：postgresql sql

Why does the following Postgres SQL query take so long?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？