What are the underlying git merge processes within the staging area?

2023-03-28 22:33 问答作者：

Git does the merge magic, and then lets the user resolve real conflicts, which is as it should be. I'm looking for a low level description of the how and why of the basic git merge and how it uses the staging area.

I've just read the Git Parable, and the comment on here that

Even taking into account the fact that its is "parable" and not recount of the history of Git (whic开发者_开发百科h you can find in some detail on Git Wiki, by the way), one point stays: it is IMVHO bad practice to explain staging area in the terms of splitting changes into more than one commit and/or comitting with dity tree, i.e. with some changes uncomitted. Staging area main strength (besides being explicit version of other SCMs implicit to-be-added area) is dealing with CONFLICTED MERGE, and that is how it should be explained, I think.

The git merge man page identifies the stage 1/2/3 elements of the merge, but obviously doesn't go into details of whys and wherefores.

Can folk advise on any articles on how and why git manages to achieve the results others don't (over and above the Linus V Bram detailed in Wincent's blog), i.e. the alleged Trivial part?

Most web articles assume that merges 'just happen', and I haven't found anything that explains the issues (e.g. the need for small commits, the value of a common commit, etc).

This should help with at least some of your questions as it's the most common merge that git does:

git merge-file

git merge-file is designed to be a minimal clone of RCS merge; that is, it implements all of RCS merge's functionality which is needed by git(1).

Most every VCS employs the basic concept of a three-way merge. This compares two branches with a common ancestor of each, so if a line of code is different between the two branches, you know which branch changed it. If they both changed it, you have a merge conflict that must be resolved by a human.

There are a few cases where it is difficult to determine a suitable common ancestor. A lot of research went into different algorithms for this, many involving the tracking of additional metadata with the commits.

Linus' essential innovation was the tracking of trees rather than files. That's sort of a subtle distinction. To illustrate with the example from Wincent's blog, consider a file foo in branch A. You branch off to make branch B. In branch A foo is renamed to bar. In branch B, it is deleted. You then attempt to merge.

If you are tracking files, it goes like this:

Before branching, version 1 of file foo is created.

After the next commit, branch A points to version 2 of foo, which is a deleted file, and version 1 of new file bar.

After the next commit, branch B points to version 2.1 of foo, which is a deleted file.

When you merge, version 2 and 2.1 of foo are compared and found to be identical. No merge conflict there. Branch B doesn't even have a file called bar, so no conflict there either. You end up with the merge algorithm silently accepting branch A's rename, even though there was a real conflict between foo being deleted and it being renamed.

If you are tracking trees, it goes like this:

Before branching, a blob with hash dcb8bd7a97ab39f4c156a1a96d4b10720a39fb81 is created. A tree is created with an entry containing a label foo pointing to the hash.

After the next commit, branch A points to a tree with an entry containing a label bar pointing to the same hash.

After the next commit, branch B points to an empty tree.

When you merge, the trees are compared, with B showing a deletion and A showing a rename of the blob dcb8bd7a97ab39f4c156a1a96d4b10720a39fb81. Human is asked which one he prefers.

You can mitigate the effect somewhat with a file-tracking VCS by adding metadata for renames, but git's way uses its normal standard data structure. Also, the metadata way has difficulties with complex merges where there are many possible choices for the common ancestor. You could put a billion possible paths between the common ancestor and the two branch heads, and git will still see a blob with the same hash and be able to detect a rename and a delete. It's also difficult to preserve metadata when accepting changes in a patch via email, for instance.

It gets a little trickier with a renamed file that changes at the same time, but by tracking the trees, git has all the information it needs. It sees blob dcb8bd7a97ab39f4c156a1a96d4b10720a39fb81 gone from both branches, but it also sees a new tree entry pointing to a new blob, and can compare the two. If a significant portion of the file matches, it's considered a rename. Obviously this breaks down if you make a ton of changes in a renamed file, but at some point no merge algorithm is going to be able to help you.

See this email from Linus for more insight about his philosophy on this topic.

继续阅读：git merge

What are the underlying git merge processes within the staging area?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？