SVN Error: Can't convert string from native encoding to 'UTF-8'

2022-12-17 14:23 问答作者：

I've got a post-commit hook script that performs a SVN update of a working copy when commits are made to the repository.

When users commit to the repository from their Windows machines using TortoiseSVN they get the following error:

post-commit hook failed (exit code 1) with output:
svn: Error converting entry in directory '/home/websites/devel/website/guides/Images' to UTF-8
svn: Can't convert string from native encoding to 'UTF-8':
svn: Teneriffa-S?\195?\188d.jpg

The file in question above is: Teneriffa-Süd.jpg notice the accented u. This is because the site is German and the files have been spelt in German.

When executing a update on the working copy at the Linux command-line no errors are encountered. The above error only exists when the post-commit hook is executed via a commit by a Windows SVN client.

Questions:

Why would SVN try to change the encoding of a file?
Are filenames allowed to contain chars that are outside the Windows standard ASCII ones?

Update:

It turns out that the file开发者_如何学Go in question's filename correctly displays as Teneriffa-Süd.jpg when viewed from a Windows machine (via Samba) but when I view the filename from the Linux server (using SSH and PuTTY) where the file resides I get Teneriffa-SÃ¼d.jpg

Yet another example:

$ svn update
svn: Error converting entry in directory '.' to UTF-8
svn: Can't convert string from native encoding to 'UTF-8':

$ export LC_CTYPE=en_US.UTF-8

$ svn update

(... and all is fine now)

It does not change the encoding of the file. It changes the encoding of the filename (to something that every client can hopefully understand).
Allowed by whom ? NTFS uses 16-bit code points, and Windows can expose the file names in various encodings, based on how you ask for it (it will try to convert them to the encoding you ask for). Now... That bit (how you ask) depends on the specific svn client you use. It sounds to me like a bug in TortoiseSVN.

Edit to add:

Ugh. I misunderstood the symptoms. the svn server stores everything in utf-8 (and it seems that it did that successfully).

The post-commit hook is the bit that fails to convert from UTF-8. If I understand what you're saying correctly, the post-commit hook on the server triggers an svn update to a shared drive (the svn server therefore starts an svn client to itself...) ? This means that the configuration that needs to be fixed is the one for the client on the server. ~~Check the LANG / LC_ALL on the environment executing the svn server.~~. As it happens, the hooks are run in a vacuum environment (see Tip). So you should set the variable in the hook itself.

See also this page for info on how svn handles localisation

If Error is -

[abc@288832-web3 public_html]$ svn update
svn: Error converting entry in directory 'images' to UTF-8
svn: Valid UTF-8 data
(hex: 46 65 6e 65 72 62 61 68)
followed by invalid UTF-8 sequence
(hex: e7 65 2b 46)

Then do this.

[abc@288832-web3 public_html]$ printf "\x46\x65\x6e\x65\x72\x62\x61\x68\n"
Fenerbah

(This means that the system has some file name starting with "Fenerbah" in that folder.)

[abc@288832-web3 public_html]$ cd  images
[abc@288832-web3 images]$ rm -rf Fenerbahçe+Forma+2.jpg

So you can see that there is a special character in the name and it is not supported by SVN.

put this in your post-commit export LANG=xxxxx (your lang)

Just use the following line in your script before executing any svn command. User appropriate language codes, in following example I used japanese

export LC_ALL=ja_JP.UTF8

Don't forget to generate those locales in your system
(as root)

example for Ru

locale-gen ru_RU.CP1251
locale-gen ru_RU.UTF-8
dpkg-reconfigure locales

It changes the encoding to a location-neutral encoding in case someone with a different encoding checks it out.
Of course. But it's not "Windows" ASCII (Windows actually uses some strange encoding like CP1251 or so).

The best way to fix this is to make sure that your system uses UTF-8 whenever possible (check $LANG).

It seems that all LC_ varables need .UTF8 at the end. For example, I happened to have LC_ALL, LC_TIME, and LC_CTYPE defined. After setting LC_CTYPE the problem was not solved, so I needed to type LC_ALL as well and then it worked:

LC_ALL=en_US.UTF-8
LC_TIME=en_DK.UTF-8
LC_CTYPE=en_US.UTF-8

In order to avoid the problem again, I copied the file to a different name, removed the old one from svn, added new one to svn, and send a message to a collaborator not to do this.

I got a similar problem when running "svn add" on a directory, but the solution was different. I couldn't see the "hex" digits using printf (actually no hex output was shown by svn), but this command allowed me to see the results, and fix it:

LC_ALL=C svn add probealign

I think, in general, sticking LC_ALL=C before your command allows you to see the offending files... and is a lot easier than pasting in a lot of \x72 stuff (which apparently may not be available).

For information, I got this error on commit native encoding to 'UTF-8'with a windows client tortoise svn,

when my URL of repository was :

http://x.x.x.x/svn/myrepos

I changed my URL of repository for :

svn://x.x.x.x/myrepos

and now all is perferct.

I think this information will be useful to some.

In my case, I had the setting in ~/.subversion/config as below log-encoding = ...

Commenting it worked.

继续阅读：svn version-control

SVN Error: Can't convert string from native encoding to 'UTF-8'

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？