开发者

Git and the Umlaut problem on Mac OS X

Today I discovered a bug for Git on Mac OS X.

For example, I will commit a file with the name überschrift.txt with the German special character Ü at the beginnin开发者_如何转开发g. From the command git status I get following output.

Users-iMac: user$ git status

On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   "U\314\210berschrift.txt"
nothing added to commit but untracked files present (use "git add" to track)

It seems that Git 1.7.2 has a problem with German special characters on Mac OS X. Is there a solution to get Git read the file names correct?


Enable core.precomposeunicode on the mac

git config --global core.precomposeunicode true

For this to work, you need to have at least Git 1.8.2.

Mountain Lion ships with 1.7.5. To get a newer git either use git-osx-installer or homebrew (requires Xcode).

That's it.


The cause is the different implementation of how the filesystem stores the file name.

In Unicode, Ü can be represented in two ways, one is by Ü alone, the other is by U + "combining umlaut character". A Unicode string can contain both forms, but as it's confusing to have both, the file system normalizes the unicode string by setting every umlauted-U to Ü, or U + "combining umlaut character".

Linux uses the former method, called Normal-Form-Composed (or NFC), and Mac OS X uses the latter method, called Normal-Form-Decomposed (NFD).

Apparently Git doesn't care about this point and simply uses the byte sequence of the filename, which leads to the problem you're having.

The mailing list thread Git, Mac OS X and German special characters has a patch in it so that Git compares the file names after normalization.


The following put in ~/.gitconfig works for me on 10.12.1 Sierra for UTF-8 names:

precomposeunicode = true
quotepath = false

The first option is needed so that git 'understands' UTF-8 and the second one so that it doesn't escape the characters.


To make git add file work with umlauts in file names on Mac OS X, you may convert file path strings from composed into canonically decomposed UTF-8 using iconv.

# test case

mkdir testproject
cd testproject

git --version    # git version 1.7.6.1
locale charmap   # UTF-8

git init
file=$'\303\234berschrift.txt'    # composed UTF-8 (Linux-compatible)
touch "$file"
echo 'Hello, world!' > "$file"

# convert composed into canonically decomposed UTF-8
# cf. http://codesnippets.joyent.com/posts/show/12251
# printf '%s' "$file" | iconv -f utf-8 -t utf-8-mac | LC_ALL=C vis -fotc 
#git add "$file"
git add "$(printf '%s' "$file" | iconv -f utf-8 -t utf-8-mac)"  

git commit -a -m 'This is my commit message!'
git show
git status
git ls-files '*'
git ls-files -z '*' | tr '\0' '\n'

touch $'caf\303\251 1' $'caf\303\251 2' $'caf\303\251 3'
git ls-files --other '*'
git ls-files -z --other '*' | tr '\0' '\n'


Change the repository's OSX-specific core.precomposeunicode flag to true:

git config core.precomposeunicode.true

To make sure new repositories get that flag, also run:

git config --global core.precomposeunicode true

Here is the relevant snippet from the manpage:

This option is only used by Mac OS implementation of Git. When core.precomposeunicode=true, Git reverts the unicode decomposition of filenames done by Mac OS. This is useful when sharing a repository between Mac OS and Linux or Windows. (Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7). When false, file names are handled fully transparent by Git, which is backward compatible with older versions of Git.


It is correct.

Your filename is in UTF-8, Ü being represented as LATIN CAPITAL LETTER U + COMBINING DIAERESIS (Unicode 0x0308, utf8 0xcc 0x88) instead of LATIN CAPITAL LETTER U WITH DIAERESIS (Unicode 0x00dc, utf8 0xc3 0x9c). The Mac OS X HFS file system decomposes Unicode in a such way. Git in turn shows the octal-escape form of the non-ASCII filename bytes.

Note that Unicode filenames can make your repository non-portable. For example, msysgit has had problems dealing with Unicode filenames.


I had similar problem with my personal repository, so I wrote a helper script with Python 3. You can grap it here: https://github.com/sjtoik/umlaut-cleaner

The script needs a bit of manual labour, but not much.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜