Backing up a DB with Git - a good idea?

2023-03-01 03:41 问答作者：

The way I see it dumping a PostgeSQL DB into one big SQL file and then committing and pushing to a remote Git repo can be a terrific backup solution: I get a history of all v开发者_StackOverflowersions, hashing, secure transport, one-way (really hard to mess up and delete data by pushing), efficient storage (assuming no binaries) and no chance of a new image corrupting the backup (which is the risk with rsync).

Has anybody used this approach, especially with pg, and can share his/her experience? Pitfalls?

Here is the full script details on how to do this for postgres.

Create a Backup User

The scripts presume the existence of a user called 'backup' that has access to either all (superuser) or the specific database. The credentials are stored in the .pgpass file in the home directory. That file looks like this (presuming the password is 'secret').

~/.pgpass

*:*:*:backup:secret

Make sure you set the correct security on .pgpass or it will be ignored

chmod 0600 ~/.pgpass

Backup a Single Database

This dumps a specific database.

backup.sh

pg_dump dbname -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master

Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.

Backup all Databases on this machine

This script will dump the entire database cluster (all databases):

pg_dumpall -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master

Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.

Schedule it to Run

The final step is to add this to a cron job. So, 'crontab -e' and then add something like the following (runs every day at midnight)

# m h  dom mon dow   command
# run postgres backup to git
0 0 * * * /home/ubuntu/backupdbtogit/backup.sh

Restore

If you need to restore the database, you'll checkout the version you want to restore then pass to pg. (more details on that here http://www.postgresql.org/docs/8.1/static/backup.html#BACKUP-DUMP-RESTORE )

for a single database:

psql dbname < infile

for the entire cluster

psql -f infile postgres

None of this was particularly complicated, but it's always tedious looking up all the parts.

Crashing on Server with limited RAM

I experienced an issue with git failing on a push. This was due to git using a lot of memory - several commits had backed up. I resolved the failure by mounting the server git repo on my local machine (which has plenty of RAM). I mounted the server disk using sshfs and then committed from my workstation machine. After I did this, the low memory server resumed commits without a problem.

A better alternative is to limit the memory usage of git during the pack (from Is there a way to limit the amount of memory that "git gc" uses? ).

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

Note: I have not tried setting a memory limit yet, since I have not had the push failure problem again.

Generally, you ought to use a backup tool for doing backups, and a version control tool to do version control. They are similar, but not the same.

Some people mix the two, where for example essentially whatever is in the database is the version, and that doesn't have to be wrong, but be clear about what you want.

If you're talking about just the schema, then you probably can't do much wrong with "backups" using Git. But if you want to back up the data, then things can get complicated. Git isn't very good with large files. You could use something like git-annex to address that, but then you need a separate backup mechanism to create the external files. Also, using "proper" backup methods such as pg_dump or WAL archiving give other advantages, such as being able to restore subsets of databases or doing point-in-time recovery.

You probably also want to back up other parts of an operating system. How do you do that? Preferrably not with a version control system, because they don't preserve file permissions, timestamps, and special files so well. So it would make some sense to tie your database backup into your existing backup system.

I would definitely recommend it. People have been doing it as well, mainly around MySQL, but I don't think there is much of a difference:

http://www.viget.com/extend/backup-your-database-in-git/

Another approach is using ZFS snapshots for backups.

http://www.makingitscale.com/2010/using-zfs-for-fast-mysql-database-backups.html

I did this in $day_job, but it's with MySQL.

I had to write a script to chunkify the monolithic mysqldump file into individual files so that that I can get nice diff reports and also because git deals with small files better.

The script splits the monolithic sql file into individual sql table schemas and data.

I also had to ensure each sql insert statements are not on the same line in order to have readable diff reports.

One advantage of keeping the dump in git is that I can run "git log --stat" to get an overview of which tables changed between revisions of the "backup".

try this tool, make backups, you can download your backups, you only configure the server and the repository, you can also add users and privileges, you can send commits (changes to your database from the platform) and make backups by table or database to the just as it sends the changes to the complete database or only to the table you require, I am using it and at the moment everything is fine

https://hub.docker.com/r/arelis/gitdb

继续阅读：backup git postgresql

Backing up a DB with Git - a good idea?

Create a Backup User

Backup a Single Database

Backup all Databases on this machine

Schedule it to Run

Restore

Crashing on Server with limited RAM

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Create a Backup User

Backup a Single Database

Backup all Databases on this machine

Schedule it to Run

Restore

Crashing on Server with limited RAM

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？