Merge git repository in subdirectory
I'd like to merge a remote git repository in my working git repository as a subdirectory of it. I'd like the resulting repository to contain the merged history of the two repositories and also that each file of the merged-in repository retain its history as it was in the remote repository. I tried using the subtree strategy as mentioned in How to use the subtree merge strateg开发者_如何学编程y, but after following that procedure, although the resulting repository contains indeed the merged history of the two repositories, individual files coming from the remote one haven't retained their history (`git log' on any of them just shows a message "Merged branch...").
Also I don't want to use submodules because I do not want the two combined git repositories to be separate anymore.
Is it possible to merge a remote git repository in another one as a subdirectory with individual files coming from the remote repository retaining their history?
Thanks very much for any help.
EDIT: I'm currently trying out a solution that uses git filter-branch to rewrite the merged-in repository history. It does seem to work, but I need to test it some more. I'll return to report on my findings.
EDIT 2: In hope I make myself more clear I give the exact commands I used with git's subtree strategy, which result in apparent loss of history of the files of the remote repository. Let A be the git repo I'm currently working in and B the git repo I'd like to incorporate into A as a subdirectory of it. It did the following:
git remote add -f B <url-of-B>
git merge -s ours --no-commit B/master
git read-tree --prefix=subdir/Iwant/to/put/B/in/ -u B/master
git commit -m "Merge B as subdirectory in subdir/Iwant/to/put/B/in."
After these commands and going into directory subdir/Iwant/to/put/B/in, I see all files of B, but git log
on any one of them shows just the commit message "Merge B as subdirectory in subdir/Iwant/to/put/B/in." Their file history as it is in B is lost.
What seems to work (since I'm a beginner on git I may be wrong) is the following:
git remote add -f B <url-of-B>
git checkout -b B_branch B/master # make a local branch following B's master
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&subdir/Iwant/to/put/B/in/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
git checkout master
git merge B_branch
The command above for filter-branch is taken from git help filter-branch
, in which I only changed the subdir path.
git-subtree
is a script designed for exactly this use case of merging multiple repositories into one while preserving history (and/or splitting history of subtrees, though that is seems to be irrelevant to this question). It is distributed as part of the git tree since release 1.7.11.
To merge a repository <repo>
at revision <rev>
as subdirectory <prefix>
, use git subtree add
as follows:
git subtree add -P <prefix> <repo> <rev>
git-subtree implements the subtree merge strategy in a more user friendly manner.
The downside is that in the merged history the files are unprefixed (not in a subdirectory). Say you merge repository a
into b
. As a result git log a/f1
will show you all the changes (if any) except those in the merged history. You can do:
git log --follow -- f1
but that won't show the changes other then in the merged history.
In other words, if you don't change a
's files in repository b
, then you need to specify --follow
and an unprefixed path. If you change them in both repositories, then you have 2 commands, none of which shows all the changes.
More on it here.
After getting the fuller explanation of what is going on, I think I understand it and in any case at the bottom I have a workaround. Specifically, I believe what is happening is rename detection is being fooled by the subtree merge with --prefix. Here is my test case:
mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA
cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB
cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
git read-tree --prefix=bdir -u B/master
git commit -m "subtree merge B into bdir"
cd bdir
echo BBB>>B
git commit -a -m BBB
We make git directories a and b with several commits each. We do a subtree merge, and then we do a final commit in the new subtree.
Running gitk
(in z/a) shows that the history does appear, we can see it. Running git log
shows that the history does appear. However, looking at a specific file has a problem: git log bdir/B
Well, there is a trick we can play. We can look at the pre-rename history of a specific file using --follow. git log --follow -- B
. This is good but isn't great since it fails to link the history of the pre-merge with the post-merge.
I tried playing with -M and -C, but I wasn't able to get it to follow one specific file.
So, the solution, I feel, is to tell git about the rename that will be taking place as part of the subtree merge. Unfortunately git-read-tree is pretty fussy about subtree merges so we have to work through a temporary directory, but that can go away before we commit. Afterwards, we can see the full history.
First, create an "A" repository and make some commits:
mkdir -p z/a z/b
cd z/a
git init
echo A>A
git add A
git commit -m A
echo AA>>A
git commit -a -m AA
Second, create a "B" repository and make some commits:
cd ../b
git init
echo B>B
git add B
git commit -m B
echo BB>>B
git commit -a -m BB
And the trick to making this work: force Git to recognize the rename by creating a subdirectory and moving the contents into it.
mkdir bdir
git mv B bdir
git commit -a -m bdir-rename
Return to repository "A" and fetch and merge the contents of "B":
cd ../a
git remote add -f B ../b
git merge -s ours --no-commit B/master
# According to Alex Brown and pjvandehaar, newer versions of git need --allow-unrelated-histories
# git merge -s ours --allow-unrelated-histories --no-commit B/master
git read-tree --prefix= -u B/master
git commit -m "subtree merge B into bdir"
To show that they're now merged:
cd bdir
echo BBB>>B
git commit -a -m BBB
To prove the full history is preserved in a connected chain:
git log --follow B
We get the history after doing this, but the problem is that if you are actually keeping the old "b" repo around and occasionally merging from it (say it is actually a third party separately maintained repo) you are in trouble since that third party will not have done the rename. You must try to merge new changes into your version of b with the rename and I fear that will not go smoothly. But if b is going away, you win.
I wanted to
- keep a linear history without explicit merge, and
- make it look like the files of the merged repository had always existed in the subdirectory, and as a side effect make
git log -- file
work without--follow
.
Step 1: Rewrite history in the source repository to make it look like all files always existed below the subdirectory.
Create a temporary branch for the rewritten history.
git checkout -b tmp_subdir
Then use git filter-branch
as described in How can I rewrite history so that all files, except the ones I already moved, are in a subdirectory?:
git filter-branch --prune-empty --tree-filter '
if [ ! -e foo/bar ]; then
mkdir -p foo/bar
git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files foo/bar
fi'
Step 2: Switch to the target repository. Add the source repository as remote in the target repository and fetch its contents.
git remote add sourcerepo .../path/to/sourcerepo
git fetch sourcerepo
Step 3: Use merge --onto
to add the commits of the rewritten source repository on top of the target repository.
git rebase --preserve-merges --onto master --root sourcerepo/tmp_subdir
You can check the log to see that this really got you what you wanted.
git log --stat
Step 4: After the rebase you’re in “detached HEAD” state. You can fast-forward master to the new head.
git checkout -b tmp_merged
git checkout master
git merge tmp_merged
git branch -d tmp_merged
Step 5: Finally some cleanup: Remove the temporary remote.
git remote rm sourcerepo
If you are really wanting to stitch things together, look up grafting. You should also be using git rebase --preserve-merges --onto
. There is also an option to keep the author date for the committer information.
I found the following solution workable for me. First I go into project B, create a new branch in which already all files will be moved to the new sub directory. I then push this new branch to origin. Next I go to project A, add and fetch the remote of B, then I checkout the moved branch, I go back into master and merge:
# in local copy of project B
git checkout -b prepare_move
mkdir subdir
git mv <files_to_move> subdir/
git commit -m 'move files to subdir'
git push origin prepare_move
# in local copy of project A
git remote add -f B_origin <remote-url>
git checkout -b from_B B_origin/prepare_move
git checkout master
git merge from_B
If I go to sub directory subdir
, I can use git log --follow
and still have the history.
I'm not a git expert, so I cannot comment whether this is a particularly good solution or if it has caveats, but so far it seems all fine.
Have you tried adding the extra repository as a git submodule? It won't merge the history with the containing repository, in fact, it will be an independent repository.
I mention it, because you haven't.
Say you want to merge repository a
into b
(I'm assuming they're located alongside one another):
cd a
git filter-repo --to-subdirectory-filter a
cd ..
cd b
git remote add a ../a
git fetch a
git merge --allow-unrelated-histories a/master
git remote remove a
For this you need git-filter-repo
installed (filter-branch
is discouraged).
An example of merging 2 big repositories, putting one of them into a subdirectory: https://gist.github.com/x-yuri/9890ab1079cf4357d6f269d073fd9731
More on it here.
Similar to hfs' answer I wanted to
- keep a linear history without explicit merge and
- make it look like the files of the merged repository had always existed in the subdirectory, and as a side effect make
git log -- file
work without--follow
.
However, I chose the more modern filter-repo
(assuming the new
repo exists and is checked out):
git clone git@host/repo/old.git
cd old
git checkout -b tmp_subdir
git filter-repo --to-subdirectory-filter old
cd ../new
git remote add old ../old
git fetch old
git rebase --rebase-merges --onto main --root old/tmp_subdir --committer-date-is-author-date
you might need to fix conflicts (manually) or change the rebase command to include --merge -s recursive -X theirs
if you want to try solving it with theirs
version:
git rebase --rebase-merges --onto main --root old/tmp_subdir --committer-
date-is-author-date --merge -s recursive -X theirs
you end up on a detached HEAD, so create a new branch and merge it to main note that modern repositories should not use a "master" branch but a "main"
branch for a more inclusive language.
git checkout -b old_merge
git checkout main
git merge old_merge
cleanup
git branch -d old_merge
git remote rm old
精彩评论