开发者

debugging postgres 9.0.1 table corruption

I have a 9.0.1 database with two corrupted tables (one table has 20 rows, the other 140). The tables seem fine for read/select operations, but updating certain rows in the table produce error messages:

update media set updated_at = now() at time zone 'UTC';

ERROR: could not read block 2 in file "base/16384/16485": read only 0 of 8192 bytes

update media_status set updated_at = now() at time zone 'UTC';

2011-04-14 00:15:15 UTC ERROR: could not read block 3 in file "base/16384/16543": read only 0 of 8192 bytes

2011-04-14 00:15:15 UTC STATEMENT: update media_status se开发者_如何学Pythont updated_at = now() at time zone 'UTC';

Examining the corrupted files in the filesystem (linux), they aren't zero bytes: ll base/16384/16485 -rwx------ 1 postgres postgres 16384 2011-04-07 09:43 base/16384/16485

I ran a "vacuum(FULL, VERBOSE) " command and the corruption (or at least the errors on update) has disappeared. Is it expected that a "vacuum(FULL)" command would/could fix table corruption? Does that providing any clues as to what may have happened?

Is there any way to determine how/when this corruption may have occurred?

I suspect that it may have occurred during a filesystem level backup (ie pg_start_backup(), tar -czf..., pg_stop_backup()), as I performed a backup and moved the database to a different system. After restoring the files and starting postgres I began getting these errors. I have tried restoring multiple times with the same tar archive with the same results (on different systems).

Thanks, Dan


It's hard to make a definite statement about this situation, but I would investigate the storage layer. Short reads like the error message indicates usually "can't happen" on local, normally attached storage. So if you have a SAN, NAS, NFS, some nontrivial RAID configuration, or something else of interest, check the logs or error counters there.

If the files are there, then that means that it's not a corruption inside PostgreSQL.

One thing that would have been interesting to try, but it's probably too late now, is to try to read the files manually and see what happens.

The fact that VACUUM FULL fixed it was probably just because that rewrites the table completely into new files, so whatever was causing hickups with the old files is gone.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜