开发者

The concept of snapshot

There is a concept of snapshot in Git basic terminology.

This concept is used in Git workflow:

  1. You modify files in your working directory.

  2. You stage the files, adding snapshots of them to your staging area.

  3. You do a commit, which takes the files as they are in the staging area and stores that snap开发者_运维问答shot permanently to your Git directory.

Could you explain exactly what is snapshot and show some small example of files and it's snapshots and why Git uses them instead of making differences as in other VCSs?


A snapshot just means what a file's contents were at a given point in time. All version control systems operate conceptually on snapshots. You want to be able to see what your source code looked like at any given point in the past. They also all store diffs in order to save storage space. Where git is unique is in two ways: the way diffs are computed and stored internally isn't directly related to the file's history, and the diffs aren't recomputed every single time they could be.

Let's say you have a 1000-byte file that gets updated on practically every build. If you change one byte of it, git will temporarily store a completely new copy of the file, with the one byte changed. This is where people flip out and say, "OMG, git is so stupid, it should store the diffs right away. I'm sticking with subversion."

However, think about how you actually use your source control. Almost everything you want to do comparisons with are things that have changed since the last time you pushed. Because it hasn't computed the diffs yet, git just happens to have a full, easily accessible cache of all those recently-changed files, where other version control systems have to start with version 1 and apply hundreds of diffs to reconstruct the same content.

Then when you do a push to share your changes, git gc is run automatically in order to store those files more efficiently for transport over the network, and diffs are computed and stored then. However, it's not necessarily a diff from version n-1 to version n of the file. If content is repeated across many files, git can take that into account. If the same change is made in several branches, git can take that into account. If a file is moved, git can take that into account. If some heuristic is discovered in the future that can make things more efficient, git can take that into account without breaking existing clients. It's not wedded to the idea that the diff must always be from one consecutive version to the next.

It's fundamental design decisions like these that make git so fast compared to other version control software.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜