PostgreSQL - Why are some queries on large datasets so incredibly slow

2022-12-31 18:04 问答作者：

I have two types of queries I run often on two large datasets. They run much slower than I would expect them to.

The first type is a sequential scan updating all records:

Update rcra_sites Set street = regexp_replace(street,'/','','i')

rcra_sites has 700,000 records. It takes 22 minutes from pgAdmin! I wrote a vb.net function that loops through each record and sends an update query for each record (yes, 700,000 update queries!) and it runs in less than half the time. Hmmm....

The second type is a simple update with a relation and th开发者_运维技巧en a sequential scan:

Update rcra_sites as sites 
Set violations='No' 
From narcra_monitoring as v 
Where sites.agencyid=v.agencyid and v.found_violation_flag='N'

narcra_monitoring has 1,700,000 records. This takes 8 minutes. The query planner refuses to use my indexes. The query runs much faster if I start with a set enable_seqscan = false;. I would prefer if the query planner would do its job.

I have appropriate indexes, I have vacuumed and analyzed. I optimized my shared_buffers and effective_cache_size best I know to use more memory since I have 4GB. My hardware is pretty darn good. I am running v8.4 on Windows 7.

Is PostgreSQL just this slow? Or am I still missing something?

Possibly try reducing your random_page_cost (default: 4) compared to seq_page_cost: this will reduce the planner's preference for seq scans by making random-accesses driven by indices more attractive.

Another thing to bear in mind is that MVCC means that updating a row is fairly expensive. In particular, updating every row in a table requires doubling the amount of storage for the table, until it can be vacuumed. So in your first query, you may want to qualify your update:

UPDATE rcra_sites Set street = regexp_replace(street,'/','','i')
                  where street ~ '/'

(afaik postgresql doesn't automatically suppress the update if it looks like you're not actually updating anything. Istr there was a standard trigger function added in 8.4 (?) to allow you to do that, but it's perhaps better to address it in the client side)

When a row is updated, a new row version is written.

If the new row does not fit in the same disk block, then every index entry pointing to the old row needs to be updated to point to the new row.

It is not just indexes on the updated data that need updating.

If you have a lot of indexes on rcra_sites, and only one or two frequently updated fields, then you might gain by separating the frequently updated fields into a table of their own.

You can also reduce the fillfactor percentage below its default of 100, so that some of the updates can result in new rows being written to the same block, resulting in the indexes pointing to that block not needing to be updated.

继续阅读：optimization postgresql query-optimization

PostgreSQL - Why are some queries on large datasets so incredibly slow

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？