开发者

How do I know if the statistics of a Postgres table are up to date?

In pgAdmin, whenever a table's statistics are out-of-date, it prompts:

Running VACUUM recommended

The estimated rowcount on the table schema.table deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.

I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.

Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?

Because of the nature of my application, here are a few rules I have to follow:

  1. I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.

  2. 开发者_C百科
  3. I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)

  4. I can't run vacuum/analyze every time to make it up-to-date.

  5. When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.

It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.

It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.

Given this scenario, how can I always tell if the statistics of a table are up-to-date?


Check the system catalogs.

=> SELECT schemaname, relname, last_autoanalyze, last_analyze FROM pg_stat_all_tables WHERE relname = 'accounts';
schemaname | relname  |       last_autoanalyze        | last_analyze 
------------+----------+-------------------------------+--------------
public     | accounts | 2022-11-22 07:49:16.215009+00 | 
(1 row)

=>

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW

All kinds of useful information in there:

test=# \d pg_stat_all_tables           View "pg_catalog.pg_stat_all_tables"
      Column       |           Type           | Modifiers 
-------------------+--------------------------+-----------
 relid             | oid                      | 
 schemaname        | name                     | 
 relname           | name                     | 
 seq_scan          | bigint                   | 
 seq_tup_read      | bigint                   | 
 idx_scan          | bigint                   | 
 idx_tup_fetch     | bigint                   | 
 n_tup_ins         | bigint                   | 
 n_tup_upd         | bigint                   | 
 n_tup_del         | bigint                   | 
 n_tup_hot_upd     | bigint                   | 
 n_live_tup        | bigint                   | 
 n_dead_tup        | bigint                   | 
 last_vacuum       | timestamp with time zone | 
 last_autovacuum   | timestamp with time zone | 
 last_analyze      | timestamp with time zone | 
 last_autoanalyze  | timestamp with time zone | 
 vacuum_count      | bigint                   | 
 autovacuum_count  | bigint                   | 
 analyze_count     | bigint                   | 
 autoanalyze_count | bigint                   |


You should not have to worry about vac'ing in your application. Instead, you should have the autovac process configured on your server (in postgresql.conf), and the server takes takes of VACCUM and ANALYZE processes based on its own internal statistics. You can configure how often it should run, and what the threshold variables are for it to process.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜