git: How to split off library from project? filter-branch, subtree?
So, I've a bigger (closed source) project, and in the context of this project created a library which could also be useful elsewhere, I think.
I now want to split off the library in its own project, which could go as open source on github or similar. Of course, the library (and its history there) should contain no traces of our project.
git-subtree seems like a solution here, but it does not completely fit.
My directory layout is something like this (since it is a Java project):
- fencing-game (git workdir)
- src
- de
- fencing_game
- transport (my library)
- protocol (part of the library)
- fencing (part of the main project interfacing with the library)
- client (part of the main project interfacing with the library)
- server (part of the main project interfacing with the library)
- client (part of the main project)
- server (part of the main project)
- ... (part of the main project)
- transport (my library)
- fencing_game
- de
- other files and directories (build system, website and such - part of the main project)
- src
After the split, I want the library's directory layout look like this (including any files directly in the bold directories):
- my-library (name to be determined)
- src
- de
- fencing_game
- transport (my library)
- protocol (part of the library)
- transport (my library)
- fencing_game
- de
- src
The history should also contain just the part of the main project's history which touches this part of the repository.
A first look showed me git-subtree split --prefix=src/de/fencing_ame/transport
, but this will
- give me a tree rooted in
transport
(which will not compile) and - include the
transport/client
,transport/server
andtransport/fencing
directories.
The first point could be mitigated by using git subtree add --prefix=src/de/fencing_ame/transport <commit>
on the receiving side, but I don't think git-subtree can do much against exporting also these subdirectories. (The idea really is to just be开发者_如何学C able to share the complete tree here).
Do I have to use git filter-branch
here?
After the split, I want to be able to import back the library in my main project, either using git-subtree or git-submodule, in a separate subdirectory rather than where it is now. I imagine the layout this way
- fencing-game (git workdir)
- src
- de
- fencing_game
- transport (empty)
- fencing (part of the main project interfacing with the library)
- client (part of the main project interfacing with the library)
- server (part of the main project interfacing with the library)
- client (part of the main project)
- server (part of the main project)
- ... (part of the main project)
- transport (empty)
- fencing_game
- de
- my-library
- src
- de
- fencing_game
- transport (my library)
- protocol (part of the library)
- transport (my library)
- fencing_game
- de
- src
- other files and directories (build system, website and such - part of the main project)
- src
I think you've got some real spelunking to do. If you just want to split off "protocol", you can do that with "git subtree split ..." or "git filter-branch ..."
git filter-branch --subdirectory-filter
fencing-game/src/de/fencing_game/transport/protocol -- --all
But if you have files in transport as well as transport/protocol, it starts to get hairy.
I wrote some custom tools to do this for a project I was on. They're not published anywhere, but you can do something similar with reposurgeon.
Splitting a subtree mixed with files from the parent project
This seems to be a common request, however I don't think there's a simple answer, when the folders are mixed together like that.
The general method I suggest to split out the library mixed in with other folders is this:
Make a branch with the new root for the library:
git subtree split -P src/de/fencing_game -b temp-br git checkout temp-br # -or-, if you really want to keep the full path: git checkout -b temp-br cd src/de/fencing_game
Then use something to re-write history to remove the parts that aren't part of the library. I'm not expert on this but I was able to experiment and found something like this to work:
git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch client server otherstuff' HEAD # also clear out stuff from the sub dir cd transport git filter-branch --tag-name-filter cat --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch fencing client server' HEAD
Note: You might need to delete the back-up made by filter-branch between successive commands.
git update-ref -d refs/original/refs/heads/temp-br
Lastly, just create a new repo for the library and pull in everything that's left:
cd <new-lib-repo> git init git pull <original-repo> temp-br
I recommend that your final library path be more like /transport/protocol
instead of the full parent project path since that seems kind of tied to the project.
The issue here is that there is no good separation of what is and isn't part of your library. I would strongly suggest that the solution is refactored and then you can just include the library as a submodule.
If the reuse of this library will be just in the same repo by other devs, just track those changes on a separate branch and don't bother with additional repos.
Will the history of the project be for your benefit only, or for the benefit of people on github?
If the history is for your benefit only, there is a simple way using grafts. Basically, just create a brand new repository for github, removing all proprietary code. Now you have an open source repo with only public code which you can push to github. In your local copy of the open source repo, you can graft the history from the proprietary repo onto the open source repo.
Doing it this way means you (or anyone with access to the proprietary repo) have the benefit of seeing the full history, but the general public will only see the code from the point you open sourced it.
What are .git/info/grafts for?
I've done something similar, but splitting several dirs of stuff into an entirely separate repo on an encrypted partition (/secure/tmp/newrepo), so they were not available to a laptop thief. I wrote the shell script and then did: git filter-branch --tree-filter '~/bin/tryit /secure/tmp/newrepo personal private' -- 95768021ff00216855868d12556137115b2789610..HEAD (the SHA avoids commits before either directory came into existance)
#!/bin/sh
# to be used with e.g:
# git filter-branch --tree-filter '~/bin/tryit /secure/tmp/newrepo personal private'
# Don't do it on any repository you can't repeatedly do:
# rm -rf foo ; git clone /wherever/is/foo
# when it breaks
SRC=`pwd`
DEST=$1
shift
MSG=/dev/shm/msg.txt
TAR=/dev/shm/tmp.tar
LIST=/dev/shm/list.txt
LOG=/dev/shm/log
DONE=''
echo $GIT_AUTHOR_DATE >> $LOG
git show --raw $GIT_COMMIT > $MSG
for A in $*
do
if [ -d $A ]
then
DONE=${DONE}x
tar -cf $TAR $A
tar -tf $TAR > ${LIST}
cat ${LIST} >> ${LOG}
rm -rf ${A}
cd ${DEST}
tar -xf $TAR
else
echo $A non-existant >> ${LOG}
fi
cd $SRC
done
if [ -z "${DONE}" ]
then
echo Empty >>$LOG
else
cd ${DEST}
unset GIT_INDEX_FILE
unset GIT_DIR
unset GIT_COMMIT
unset GIT_WORK_TREE
touch foo
git add .
git commit -a -F ${MSG} >> ${LOG}
fi
exit 0
For your purposes you'd probably want to have a different spec for the tar (e.g. --exclude= ) and then use cat ${LIST} | xargs rm to only remove stuff in the tar, but getting that right is not too tricky, I hope.
The unset stuff and exit 0 are important, since filter-branch sets those to your source repo (not what you want!) and will die if sh passes on a non-zero exit code from the last command in your script.
精彩评论