开发者

Git and binary data

I'm currently starting to use git for my version control system, however I do a fair bit of web/game development which of course requires images(binary data) to be stored. So if my understanding开发者_如何学C is correct if I commit an image and it changes 100 times, if I fetch a fresh copy of that repo I'd basically be checking out all 100 revisions of that binary file?

Is this not an issue with large repo's where images change regularly wouldn't the initial fetch of the repo end up becoming quite large? Has anybody experienced any issue's with this in the real world? I've seen a few alternatives for instance, using submodules and keeping images in a separate repo but this only keeps the codebase smaller, the image repo would still be huge. Basically I'm just wondering if there's a nice solution to this.


I wouldn't call that "checkout", but yes, the first time you fetch repository, provided that binary data is huge and incompressible it's going to be what it is - huge. And yes, since conservation law is still in effect breaking it into modules won't save you space and time on initial pulling of repository.

One possible solution is still using separate repository and --depth option when pulling it. Shallow repositories have some limitations, but I don't remember what exactly, since I never used it. Check the docs. Keyword is "shallow".

Edit: From git-clone(1):

A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches.


What I do is make the images ignored/untracked directories, and then sync the image directory/directories using other, non-git systems (or just manually copy the image directory changes once, when you're talking about alot of images that you don't need to keep completely synced).


Unfortunately git is not really made for storing binary data. Because it is distributed you would be pulling all versions of all files whenever you clone it. It also becomes ridiculously difficult to prune those large binary files out of your code repository. More about that here: (http://www.somethingorothersoft.com/2009/09/08/the-definitive-step-by-step-guide-on-how-to-delete-a-directory-permanently-from-git-on-widnows-for-dumbasses-like-myself/).

I would recommend trialing it but keep binary files separately from the code (i.e. using submodules). In that case if it doesn't work out for you, you can use another solution without rewriting the whole history for your main repository.


There is a discussion of large file storage with GIT here: http://blog.deveo.com/storing-large-binary-files-in-git-repositories/

I came across this SO question as part of my research and I thought that I would point folks to the blog entry I've already reviewed ( spoiler alert, they recommend git-annex for non-windows users ). .

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜