开发者

Running a regex in GIT Bash, my file now has funny characters in it

I ran this regex in GIT BASH (I am on a windows machine, and I have GIT installed).

perl -pe 's/\[(?:xx_)?([^]]+)\]/\[\u$1\]/g'

The file now looks as if it is written in Chinese (it is a .sql schema file开发者_StackOverflow).

example:

嵛 嬀] IDENTITY(1,1) NOT NULL,

Is there some encoding issue going on?


Isn't that similar to issue 358?

Windows command line and GUI programs use different codepages by default.
For historical DOS compatibility, the command line ("OEM") codepage is 437, while the GUI ("ANSI") codepage is 1252. See the interesting reading here.

The console uses the OEM codepage (437 on my system) while the GUI uses the ANSI codepage (1252 on my system).
When launching a program from the console, cmd.exe usually does not modify the arguments to that program, except if the program happens to be a .bat or .cmd file in which case cmd.exe performs a codepage conversion on the arguments (see "Codepage Conversions").
So git.exe already receives "Daniël" in 1252 encoding, which is why it looks fine it looking at .git/config using Notepad.
When reading user.name, however, no codepage conversion takes place and "Daniël" in 1252 encoding is printed to the console, resulting in "Daniδl" on my system.
Obviously, the situation is different when using MSYS / Git Bash.

So, to sum up, the solution when running Git from cmd.exe (via the .cmd wrappers) seems to be to:

1) change the console font from a raster font to a True Type font,
2) change the console codepage via "chcp" to match the Windows codepage (whatever that may be).


In short, a fix is coming: could you try this beta Git installer and see if you still have the encoding issue?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜