git says "fatal: confused by unstable object source data"

2023-01-30 00:43 问答作者：

Just for fun, I am trying to put around 85GB of mostly-around-6MB binary files into git. Git chugs 开发者_开发百科along for a while but invariably fails about halfway through with the message "fatal: confused by unstable object source data" followed by a SHA1. Do you know why? Is there any way to fix it?

Either

one or more files are being modified during your operation, or
something is causing inconsistent reads (e.g. failing hardware).

Short version: Git’s developers did not intend for it to be used on volatile files.

Due to the layout^* that Git uses for “loose objects” and the limited filesystem semantics that it assumes^**, Git must know the first byte (two hex characters) of the object name (SHA-1) of a new object before it can start storing that object.

^{*
The objects/[0-9a-f][0-9a-f]/ directories. See gitrepository-layout.}
^{**
Specifically, it needs to be able to do “atomic” file renames. Certain filesystems (usually network filesystems; specifically AFS, I believe) only guarantee rename atomicity when the source and the destination of a rename are inside the same directory.}

Currently, Git does two SHA-1 passes over each new file. The first pass is used to check whether whether Git already knows about the contents of the file (whether its SHA-1 object name already exists in the object store). If the object already exists, the second pass is not made.

For new contents (object was not already in the object store), the file is read a second time while compressing and computing the SHA-1 of the data being compressed. The compressed data is written to a temporary file that is only renamed to its final loose object name if the initial SHA-1 (“already stored?” check) matches the later SHA-1 (hash of the data that was compressed and written). If these SHA-1 hashes do not match, then Git shows the error message you are seeing and aborts. This error checking was added in 748af44c63 which was first released in Git 1.7.0.2.

There is another possibility, even if remote. That would be a really big file (e.g 3 or more gb) putting it simply, git is unable to handle it. We found that error trying to create a repository in a structure with huge files

From the source, the blob's sha1 is computed twice:

write_sha1_file_prepare
write_loose_object

both called from write_sha1_file (there's also a path from force_object_loose, but it is used for repacks).

The first hash is used to check if the object is already known (though git tries its best to get the filesystem's reassurance that files are unmodified, a touch or such would make it lose track); the second is the hash of the data that is actually fed to zlib for compression, then written.

The second hash might be a bit more expensive to compute due to zlib, which may explain why two hashes are computed (though that seems to be historical accident, and I'm guessing the performance cost when adding a new object has more impact than the cpu win when detecting spurious changes). Someone could add a fallback so that the write_changed_sha1 existence checking logic is redone with the new sha1, so that those unstable files can also be added. That would be useful for backups, when a few of the files being added are open.

Two theories:

Something is writing to these files while you are trying to put them into git.
You have some sort of disk/memory failure causing data corruption.

Although other responses have provided a very good explanation on why the error occurs here is a possible solution to the problem:

Track down the problematic file, adding -v to your git add command will give you some clue about the problematic file:

git add -Av

The problem might just be that the file is too large (a zipped source, some sql data file): add it to .gitignore

In fact a good practice is to regularly configure your .gitignore file to avoid compiled and compressed files like in: https://gist.github.com/octocat/9257657

It can happen if you try to git svn clone or git svn fetch a repository on a btrfs filesystem, maybe something to do with race condition or atomicity inside btrfs cow feature.

Example :

git svn --authors-file=authors.map clone http://svn.example.com/svn/repo repo

cd repo; git svn --authors-file=../authors.map fetch

I found a workaround by setting you base working directory without copy-on-write :

chattr +C .

Then you need to duplicate all your data (ex) :

cp -fr repo repo.new; rm -fr repo; mv -f repo.new repo

cp authors.map authors.map.new; mv -f authors.map.new authors.map

Then it should not fail (and run faster).

In my experience, just large amount of added files can cause this message. Committing the same files in several steps wasn't a problem.

继续阅读：git

git says "fatal: confused by unstable object source data"

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？