How to extract one file with commit history from a Git repo with index-filter & co?
I have a Git repo converted from SVN to Mercurial to Git, and I wanted to extract just one source file. I also had weird characters like aÌ
(an encoding mismatch c开发者_StackOverflow中文版orrupted Unicode ä
) and spaces in the filenames.
How can I extract one file from a repository and place it at the root of the new repo?
A faster and easier-to-understand filter that accomplishes the same thing:
git filter-branch --index-filter '
git read-tree --empty
git reset $GIT_COMMIT -- $your $files $here
' \
-- --all -- $your $files $here
Seems it's not particularly easy, and that's the reason I'll be answering my own question despite many similar questions regarding git [index-filter|subdirectory-filter|filter-tree], as I needed to use all the previous to achieve this!
First a quick note, that even a spell like in a comment on Splitting a set of files within a git repo into their own repository, preserving relevant history
SPELL='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "trie.lisp" | tr "\n" "\0" | xargs -0 git rm --cached -r --ignore-unmatch'
git filter-branch --prune-empty --index-filter "$SPELL" -- --all
will not help with files named like imaging/DrinkkejaI<0300>$'\302\210'.txt_74x2032.gif
.
The aI<0300>$'\302\210'
part once was a single letter: ä
.
So in order to extract a single file, in addition to filter-branch I also needed to do:
git filter-branch -f --subdirectory-filter lisp/source/model HEAD
Alternatively, you can use --tree-filter: (the test is needed, because the file was at another directory earlier, see: How can I move a directory in a Git repo for all commits?)
MV_FILTER='test -f source/model/trie.lisp && mv ./source/model/trie.lisp . || echo "Nothing to do."'
git filter-branch --tree-filter $MV_FILTER HEAD --all
To see all the names a file have had, use:
git log --pretty=oneline --follow --name-only git-path/to/file | grep -v ' ' | sort -u
As described at http://whileimautomaton.net/2010/04/03012432
Also follow the steps on afterwards:
$ git reset --hard
$ git gc --aggressive
$ git prune
$ git remote rm origin # Otherwise changes will be pushed to where the repo was cloned from
Note that things get much easier if you combine this with the additional step of moving the desired file(s) into a new directory.
This might be a quite common use case (e.g. moving the desired single file to the root dir).
I did it (using git 1.9) like this (first moving the file(s), then deleting the old tree):
git filter-branch -f --tree-filter 'mkdir -p new_path && git mv -k -f old_path/to/file new_path/'
git filter-branch -f --prune-empty --index-filter 'git rm -r --cached --ignore-unmatch old_path'
You can even easily use wildcards for the desired files (without messing around with grep -v ).
I'd think that this ('mv' and 'rm') could also be done in one filter-branch but it did'n work for me.
I didn't try it with weird characters but I hope this helps anyway. Making things easier seems always to be a good idea to me.
Hint:
This is a time consuming action on large repos. So if you want to do several actions (like getting a bunch of files and then rearrange them in 'new_path/subdirs') it's a good idea to do the 'rm' part as soon as possible to get a smaller and faster tree.
I've found an elegant solution using git log and git am here: https://www.pixelite.co.nz/article/extracting-file-folder-from-git-repository-with-full-git-history/
In case it goes away, here's how you do it:
in the original repo,
git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > /tmp/patch
if the file was in a subdirectory, or if you want to rename it
sed -i -e 's/deep\/path\/that\/you\/want\/shorter/short\/path/g' /tmp/patch
in a new, empty repo
git am < /tmp/patch
The following will rewrite the history and keep only commits that touch the list of files you give. You probably want to do that in a clone of your repository to avoid losing the original history.
FILES='path/to/file1 other-path/to/file2 file3'
git filter-branch --prune-empty --index-filter "
git read-tree --empty
git reset \$GIT_COMMIT -- $FILES
" \
-- --all -- $FILES
Then you can merge that new branch into your target repository, via normal merge
or rebase
commands according to your use-case.
There is a new command git filter-repo
nowadays. It has more possibilities and better performance.
See man page for details and project page for installation.
Remove everything except src/README.md and move it to the root:
git filter-repo --path src/README.md
git filter-repo --subdirectory-filter src/
--path
selects the single file and --subdirectory-filter
moves the contents of that directory to root.
精彩评论