How can one safely use a shared object database in git?
I have read in several places that it's possible to share the objects
directory between multiple git repositories, e.g. with symbolic links. I would like to do this to share the object databases between several bare repositories in the same directory:
shared-objects-database/
foo.git/
objects -> ../shared-objects-database
bar.git/
objects -> ../shared-objects-database
baz.git/
objects -> ../shared-objects-database
(I'm doing this because there are going to be lots of large blobs redundantly stored in each objects directory otherwise.)
My concern about this is that when using these repositories, git gc
will be called automatically and cause objects which are unreachable from one repository to be pruned, making the other repositories incomplete. Is there any easy way of ensuring that this doesn't happen? For example, is there a config option that would force --no-prune
to be the default for git gc
, and, if so, would that be sufficient to use this setup without risking losing data?
At the moment, I've been using the objects/info/alternates
mechanism to share objects between these repositories, but maintaining these pointers from each repository to all the others is a bit hacky.
(My other alternative is to just to have a single bare repository, with all the branches of foo.git
, bar.git
and baz.git
named foo-master
, foo-testing
, 开发者_开发技巧bar-master
, etc. However, that'd be a bit more work to manage, so if the symlinked objects directory can work safely, I'd rather do that.)
You might guess that this is one of those Using Git For What It Was Not Intended use cases, but I hope the question is clear and valid nonetheless ;)
Perhaps this was added to git after this question was asked/answered: it seems there is now a way to do this explicitly. It's described here:
https://git.wiki.kernel.org/index.php/Git_FAQ#How_to_share_objects_between_existing_repositories.3F
How to share objects between existing repositories? Do
echo "/source/git/project/.git/objects/" > .git/objects/info/alternates
and then follow it up with
git repack -a -d -l
where the
-l
means that it will only put ''local'' objects in the pack-file (strictly speaking, it will put any loose objects from the alternate tree too, so you'll have a fully packed archive, but it won't duplicate objects that are already packed in the alternate tree).
Why not just crank the gc.pruneExpire
variable up to never
? It's unlikely you'll ever have loose objects 1000 years old that you don't want deleted.
To make sure that the things which really should be pruned do get pruned, you can keep one repo which has all the others as remotes. git gc
would be quite safe in that one, since it really knows what is unreachable.
Edit: Okay, I was a bit cavalier about the time limit; as is pointed out in the comments, 1000 years isn't gonna work too well, but the beginning of the epoch would, or never
.
精彩评论