What is the native narrow string encoding on Windows?

2023-02-04 12:58 问答作者：

The Subversion API has a number of functions for converting from "natively-encoded" strings to strings that are encoded in UT开发者_高级运维F-8. My question is: what is this native encoding on Windows? Does it depend on locale?

"Natively encoded" strings are strings written in whatever code page the user is using. That is, they are numbers that are translated to the appropriate glyphs based on the correct code page. Assuming the file was saved that way and not as a UTF-8 file.

This is a candidate question for Joel's article on Unicode.

Specifically:

Eventually this OEM free-for-all got codified in the ANSI standard. In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up, depending on where you lived. These different systems were called code pages. So for example in Israel DOS used a code page called 862, while Greek users used 737. They were the same below 128 but different from 128 up, where all the funny letters resided. The national versions of MS-DOS had dozens of these code pages, handling everything from English to Icelandic and they even had a few "multilingual" code pages that could do Esperanto and Galician on the same computer! Wow! But getting, say, Hebrew and Greek on the same computer was a complete impossibility unless you wrote your own custom program that displayed everything using bitmapped graphics, because Hebrew and Greek required different code pages with different interpretations of the high numbers.

Windows 1252. Jukka Korpela has an excellent page on character encodings, with an extensive discussion of the Windows character set.

From the header svn_string.h you can see that the relevant svn_strings are just plain old const char* + a length element.

I would guess that the "natively encoded" svn strings are interpreted according to your system locale (I do not know this for sure, but this is the convention). On Windows 7 you can check your locale by selecting "Start-->Control Panel-->Region and Language-->Administrative-->Change system locale" where any value of English would probably entail the character encoding Windows 1252. However, a different system locale, for example Hebrew (Israel), would entail a different character encoding (Windows 1255 for the case of Hebrew).

Sadly the MSVC version of the C library does not support UTF-8 and uses legacy codepages only, but cygwin provides a UTF-8 locale as part of its emulation layer. If your svn is built on cygwin, you should be able to use UTF-8 just fine.

继续阅读：c character-encoding string winapi

What is the native narrow string encoding on Windows?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？