Algorithm improvement on a simple looking postgresql query

2023-01-06 06:55 问答作者：

High-level: Can I do this order by, group by based on sum any faster? (PG 8.4, fwiw., on a non-tiny table .... think O(millions of rows) )

Suppose I had a table like this:

                                 Table "public.summary"
   Column    |       Type        |                      Modifiers
-------------+-------------------+------------------------------------------------------
 ts          | integer           | not null default nextval('summary_ts_seq'::regclass)
 field1      | character varying | not null
 otherfield  | character varying | not null
 country     | character varying | not null
 lookups     | integer           | not null


Indexes:
    "summary_pk" PRIMARY KEY, btree (ts, field1, otherfield, country)
    "ix_summary_country" btree (country)
    "ix_s开发者_如何学Cummary_field1" btree (field1)
    "ix_summary_otherfield" btree (otherfield)
    "ix_summary_ts" btree (ts)

And the query I want is:

select summary.field1,
    summary.country,
    summary.ts,
    sum(summary.lookups) as lookups,
from summary
where summary.country = 'za' and
    summary.ts = 1275177600
group by summary.field1, summary.country, summary.ts
order by summary.ts, lookups desc, summary.field1
limit 100;

(English: top 100 field1's at a particular (ts,country) where 'topness' is the sum of lookups for any matching row, regardless of value of otherfield)

Is there anything I can really do to speed this up? Algorithmically this seems to be a full table scan kind of thing, but I might be missing something.

Any query plan for this query will have to scan every row that matches the WHERE conditions, rolling them up by the grouping conditions - that is, the amount of work is proportional to the number of input rows to the group by, not the number of result rows.

The most efficient query plan possible for a query like this is a single index scan. This ought to be possible if you build an index on (country, ts) in that order; with that index, every possible query of this form resolves to a contiguous range over the index. This will still require an in-memory sort, though - it may be possible to avoid this with a different index.

As others have said, though, posting an execution plan is your best option.

In order to be able to suggest anything, you should post the execution plan of the query.

And "OMG Ponies" is right: limit 100 will limit the overall result to 100 rows, it will not work on individual groups!

There is a nice article in the Postgres Wiki that explains how to post a question related to a slow query:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

Index on (country, ts) is a best bet (like Nick Johnson suggests), and additionally you may want to raise work_mem if its not set very high. You can SET this at runtime if needed (and if making it very high, then recommended). It will help keep your sorts in memory and not spill to disk (if thats happening).

For real help, we'll need to see an EXPLAIN ANALYZE, posting it on explain.depesz.com can make it very readable.

继续阅读：algorithm complexity-theory postgresql sql

Algorithm improvement on a simple looking postgresql query

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？