开发者

Fetch/Pull Part of Very Large Repository?

This is probably obvious and has been asked many times in different ways before, but I have not been able to find the answer after searching for some time.

Assume the following:

  • I have, say, a 500GB disk at the local end;
  • I have a 100 terabyte remote repository; therefore, the c开发者_运维技巧ost of cloning the entire repository is simply not feasible;
  • the working directory used to create the remote repository was composed of 1000 top level directories DIR001, DIR002, ... DIR00N each containing multiple subdirectories with files only under the leaf subdirectories (Ex. DIR001/subdir1/fileA1 ... DIR001/subf1/fileAN and DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN, ...
  • I did NOT explicitly tag or branch directories DIR001, DIR002, ... DIR00N or anything else for that matter
  • I init a brand new local git repository

How do I efficiently pull or fetch the last committed versions of, say, DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

just the last committed version of a single file from DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

How do I efficiently pull or fetch a previously committed version of a subset of said files and nothing else?

Maybe fetch/pull is not the correct command for this.


The answer to "Partial cloning" can help you start experimenting with shallow clones.
But it will be limited:

  • to a certain depth, and/or to certain branches,
  • but not to certain files or directories (you can get a file or directory though sparse checkout, but you still have to get the full repo first!)
  • Even a certain commit.
    (Git 2.5 (Q2 2015) supports a single fetch commit! See "Pull a specific commit from a remote git repository").

The real solution would be to separate the huge remote repo into submodules though.
See What are Git limits or Git style backup of binary files for illustrating this kind of situation.


Update April 2015:

Git Large File Storage (LFS) would make pull/fetch much more efficient (by GitHub, April 2015).

The project is git-lfs (see git-lfs.github.com) and tested with server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

Fetch/Pull Part of Very Large Repository?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜