开发者

How can one safely use a shared object database in git?

I have read in several places that it's possible to share the objects directory between multiple git repositories, e.g. with symbolic links. I would like to do this to share the object databases between several bare repositories in the same directory:

shared-objects-database/
foo.git/
  objects -> ../shared-objects-database
bar.git/
  objects -> ../shared-objects-database
baz.git/
  objects -> ../shared-objects-database

(I'm doing this because there are going to be lots of large blobs redundantly stored in each objects directory otherwise.)

My concern about this is that when using these repositories, git gc will be called automatically and cause objects which are unreachable from one repository to be pruned, making the other repositories incomplete. Is there any easy way of ensuring that this doesn't happen? For example, is there a config option that would force --no-prune to be the default for git gc, and, if so, would that be sufficient to use this setup without risking losing data?

At the moment, I've been using the objects/info/alternates mechanism to share objects between these repositories, but maintaining these pointers from each repository to all the others is a bit hacky.

(My other alternative is to just to have a single bare repository, with all the branches of foo.git, bar.git and baz.git named foo-master, foo-testing, 开发者_开发技巧bar-master, etc. However, that'd be a bit more work to manage, so if the symlinked objects directory can work safely, I'd rather do that.)

You might guess that this is one of those Using Git For What It Was Not Intended use cases, but I hope the question is clear and valid nonetheless ;)


Perhaps this was added to git after this question was asked/answered: it seems there is now a way to do this explicitly. It's described here:

https://git.wiki.kernel.org/index.php/Git_FAQ#How_to_share_objects_between_existing_repositories.3F

How to share objects between existing repositories? Do

echo "/source/git/project/.git/objects/" > .git/objects/info/alternates

and then follow it up with

git repack -a -d -l

where the -l means that it will only put ''local'' objects in the pack-file (strictly speaking, it will put any loose objects from the alternate tree too, so you'll have a fully packed archive, but it won't duplicate objects that are already packed in the alternate tree).


Why not just crank the gc.pruneExpire variable up to never? It's unlikely you'll ever have loose objects 1000 years old that you don't want deleted.

To make sure that the things which really should be pruned do get pruned, you can keep one repo which has all the others as remotes. git gc would be quite safe in that one, since it really knows what is unreachable.

Edit: Okay, I was a bit cavalier about the time limit; as is pointed out in the comments, 1000 years isn't gonna work too well, but the beginning of the epoch would, or never.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜