find a branching off point for repo-less code
I have a Rails plugin which was copied from a git repo with script/plugin install at some 开发者_如何学JAVApoint. Later, local patches were added to it. Now we want to maintain the code as a separate branch in a fork of the original plugin's own repo.
Given a git repo and a code tree, what's a good way to find a commit closest to the new code, e.g. minimizing the total amount of diff lines?
If you can recover the timestamp when you cloned the repo, I would look up the commit that is closest to that and branch from there. Otherwise, you are going to have a hairy time.
You are essentially asking for the minimum edit distance between your code and a git repo, which is an NP-hard problem, and a bad one in this case since you need the tree diffs and the edit distance of each git blob (that is, the code files and other objects).
You could try to find a needle in the haystack with the help of git-tree-diff, by first cloning the repo of the plugin, making a branch, then committing all your changes on top of it. tree-diff will then let you assess the difference, but then you'd have to repeat this for every commit, and it would be hell.
Instead, I would take your current code, do the above so you can get one huge diff from the master's HEAD of the plugin repo, then try to split your changes into as many atomic commits as you can.
It'll hurt, but you just might see the end of it.
EDIT: here is an alternative that may prove tractable, albeit still annoying. Since you have the history and can get the earliest version, you can calculate the git hash blobs for the "original" files and find them in the history of the owner's repo. In your history, check out the plugin before you made any changes. This will let you calculate the blob hash for any individual file and its content. You can then do a search through git history on the official repo for the blob hashes you find. This will identify at what point, specifically at which commit, the plugin file was at when you originally installed it. Then you can compare and find the oldest commit.
The kernel.org git docs provide an example to do just this:
git log --raw --abbrev=40 --pretty=oneline |
grep -B 1 `git hash-object filename`
This will find you the commit w/ hash, author, and timestamp. I will try to think of a way of further automating this easily.
精彩评论