Sorting postgresql database dump (pg_dump)

2022-12-19 19:03 问答作者：

I am creating to pg_dumps, DUMP1 and DUMP2.

DUMP1 and DUMP2 are exactly the same, except DUMP2 was dumped in REVERSE order of DUMP1.

Is there anyway that I can sort the two DUMPS so that the two DUMP files are exactly the same (when using a diff)?

I am u开发者_如何学编程sing PHP and linux. I tried using "sort" in linux, but that does not work...

Thanks!

From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.

As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.

Here is a way that you can do the total comparison including schema and data.

First, compare the schema using this method.

Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).

The query below generates the COPY statements.

select
    'copy (select * from '||r.relname||' order by '||
    array_to_string(array_agg(a.attname), ',')||
    ') to STDOUT;'
from
    pg_class r,
    pg_constraint c,
    pg_attribute a
where
    r.oid = c.conrelid
    and r.oid = a.attrelid
    and a.attnum = ANY(conkey)
    and contype = 'p'
    and relkind = 'r'
group by
    r.relname
order by
    r.relname

Running that query will give you a list of statements like copy (select * from test order by a,b) to STDOUT; Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to COPY.

My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory.

You use this little Java program e.g. with

java -cp ./pgdumpsort.jar PgDumpSort db.sql

And you get a file named "db-sorted.sql", or specify the output file name:

java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql

And the sorted data is in a file like "db-2013-06-06.sql"

Now you can create patches using diff

diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff

This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using

 patch -p1 < db-0506.diff

(Source code is inside of the JAR file)

performance is less important than order
you only care about the data not the schema
and you are in a position to recreate both dumps (you don't have to work with existing dumps)

you can dump the data in CSV format in a determined order like this:

COPY (select * from your_table order by some_col) to stdout
      with csv header delimiter ',';

See COPY (v14)

It's probably not worth the effort to parse out the dump.

It will be far, far quicker to restore DUMP2 into a temporary database and dump the temporary in the right order.

Here is another solution to the problem: https://github.com/tigra564/pgdump-sort

It allows sorting both DDL and DML including resetting volatile values (like sequence values) to some canonical values to minimize the resulting diff.

继续阅读：pg-dump php postgresql sorting

Sorting postgresql database dump (pg_dump)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？