Can I avoid using CP1252 on Windows?

2022-12-15 01:50 问答作者：

I would like all my开发者_如何学Go toolkit to use UTF-8 but find that some tools on Windows seem to use CP1252 (which appears to be Windows-specific). Does this create output which is incompatible and if so at which codepoints? If so, can I do anything about it?

(I don't completely understand the issues so I'd be grateful for basic education on these encodings).

Tools hard-coding for code page 1252 on Windows is very unlikely. Much more likely is that it happens to be the default code page on your machine. 1252 is used in Western Europe and the Americas. It is configured in Control Panel, Regional and Language options. They've been using different names for it, on Win7 it is in the Administrative tab, Change System Locale.

Yes, many tools use the default code page unless they have a good reason to chose another encoding. The BOM is such a good reason. Notable examples are Notepad (unless you change the Encoding in the File + Open dialog to something else than Ansi) and C/C++ compilers. There typically isn't anything special you need to do to use the default code page. Guessing the correct code page for a text file when you don't have a BOM is impossible to do accurately. Google "bush hid the facts" for a very amusing war story.

Six years old and still relevant: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Now, about your question: Yes, there are still tools out there that choke on UTF-8 files. But more and more tools are "getting it". If you're developing your own stuff, you might want to look into Python 3 where all strings are Unicode. The philosophy is to convert all your inputs into Unicode (if necessary) as early as possible, and reconvert them to a target encoding as late as possible. There are toolkits out there that will do a good job of guessing the encoding of a particular file (for example, Mark Pilgrim's chardet, a port of Mozilla's encoding detector). This is nice if you're working with files that don't specify an encoding.

CP1252 and UTF-8 are the same for all characters < 128. They differ above that. So if you stick to English and stay away from diacritical marks these will be the same.

Most of the Windows tools will use whatever is set as the current user's current codepage, which will default to 1252 for US Windows. You can change that to another codepage pretty easily. But UTF-8 is NOT one of the available codepage options for Windows. (I wish it was).

Some utilities under Windows will understand the UTF-8 byte-order mark at the start of a file. Unfortunately I don't know how to determine if this will work except to try it.

UTF-8 is supported on Windows but not as a current codepage. You can use UTF-8 for converting to/from it but you cannot set is as current codepage.

First do not try to waste time by setting the codepage - this approach will remind you of Sisyphus myth - you can't really solve the problem using codepages, you have to use Unicode.

The only real solution for you is to build your application as Unicode so it will use UTF-16 and to convert to/from UTF-8 on in/out operations. This is done quite simple because fopen supports reading or writing UTF-8.

Regarding the usage of other Windows tools with UTF-8 file, you should not be aware because if the tool is able to work with ASCII it will work with UTF-8 (even so it may not be able to distinguish between Unicode chars but at least it will be able to load/parse the files).

BTW, You forgot to specify what programming language are you using and what Windows tools are you considering for usage.

Also, if you ware interested about more internationalization stuff please visit my blog.i18n.ro

继续阅读：cp1252 utf-8

Can I avoid using CP1252 on Windows?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？