Simple aggregating query very slow in PostgreSql, any way to improve?

2022-12-31 07:41 问答作者：

I have a table which holds files and their types such as

CREATE TABLE files (
    id          SERIAL PRIMARY KEY, 
    name        VARCHAR(255),
    filetype    VARCHAR(255),
    ...
);

and another table for holding file properties such as

CREATE TABLE properties (
    id          SERIAL PRIMARY KEY, 
    file_id     INTEGER CONSTRAINT fk_files REFERENCES files(id),
    size        INTEGER,
    ... // other property fields
);

The file_id field has an index.

The file table has around 800k lines, and the properties table around 200k (not all files necessarily have/need a properties).

I w开发者_如何学运维ant to do aggregating queries, for example find the average size and standard deviation for all file types. But it's very slow - around 70 seconds for the latter query. I understand it needs a sequential scan, but still it seems too much. Here's the query

SELECT f.filetype, avg(size), stddev(size) FROM files as f, properties as pr 
 WHERE f.id = pr.file_id GROUP BY f.filetype;

and the explain

 HashAggregate  (cost=140292.20..140293.94 rows=116 width=13) (actual time=74013.621..74013.954 rows=110 loops=1)
   ->  Hash Join  (cost=6780.19..138945.47 rows=179564 width=13) (actual time=1520.104..73156.531 rows=179499 loops=1)
         Hash Cond: (f.id = pr.file_id)
         ->  Seq Scan on files f  (cost=0.00..108365.41 rows=1140941 width=9) (actual time=0.998..62569.628 rows=805270 loops=1)
         ->  Hash  (cost=3658.64..3658.64 rows=179564 width=12) (actual time=1131.053..1131.053 rows=179499 loops=1)
               ->  Seq Scan on properties pr  (cost=0.00..3658.64 rows=179564 width=12) (actual time=0.753..557.171 rows=179574 loops=1)
Total runtime: 74014.520 ms

Any ideas why it is so slow/how to make it faster?

Have you defined reasonable settings for server parameters like shared_buffers, work_mem and effective_cache_size? http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

In particular, I think work_mem will affect how much of the hash table for the join can be kept in memory as opposed to on-disk. Also, a reduced random_page_cost might influence the planner towards using a merge join instead- try temporarily setting "enable_hashjoin" to off and see if that produces a plan that works better?

I don't know about postgressql but I'd

make sure filetype has an index, perhaps a covering index on filetype and id.
try rewriting the query like this

SQL Statement

SELECT  f.filetype
        , avg_size
        , stddev_size
FROM    files as f
        INNER JOIN (
          SELECT  file_id
                  , avg(size) as avg_size
                  , stddev(size) as stddev_size
          FROM    properties
          GROUP BY 
                  file_id
        ) p ON p.file_id = f.id

继续阅读：aggregate postgresql sql

Simple aggregating query very slow in PostgreSql, any way to improve?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？