What are Windows code pages?

2022-12-25 23:35 问答作者：

I'm trying to gain a basic understanding of what is meant by a Windows code page. I kind of get the feeling it's a translation between a given 8 bit value and some 'abstraction' for a given character graphic.

I made the following experiment. I created a "" character literal with two versions of the letter u with an umlaut. One created using the ALT 129 (uses code page 437) value and one using the ALT 0252 (uses code page 1252) value. When I examined the literal both characters had the value 252.

Is 252 the universal 8 bit abstraction for u with an umlaut? Is it the Unicode value?

Aside from keyboard input are there any library rou开发者_Go百科tines or system calls that use code pages? For example is there a function to translate a string using a given code table (as above for the ALT 129 value)?

Windows code-pages are a relic of pre-unicode days, when languages with different characters would still attempt to represent them using one (or two in the case of Asian) bytes. This is where the concept of a character set comes into play. English, for instance, is "windows-1252". The various code pages can be installed through the Regional & Language Options control panel. A list of code-pages can be found here - http://msdn.microsoft.com/en-us/goglobal/bb964654.aspx

Within .NET, code-pages are accessed through the System.Text.Encoding class. This provides a method for converting from one code page to another. For instance, to convert a string in windows-1252 to utf8 (admittedly usually a fairly pointless exercise), you could use this code:

using System.Text;

public string GetUtf8StringFromDefaultEncoding(string target, string codePage) {
     Encoding windows = Encoding.GetEncoding(codePage);
     byte[] windowsBytes = windows.GetBytes("Hello World");
     string utf8String = new UTF8Encoding().GetString(windowsBytes);
     return utf8String;
}

public static void Main() {
     Console.Out.WriteLine(GetUtf8StringFromDefaultEncoding("Hello World", 
                           "windows-1252"));
}

A Windows code page is similar to a code set such as ISO 8859-1. It maps certain numbers (how characters are stored on disk) to certain glyphs (characters as they appear on the screen, in an abstract way). It does not correspond to a font directly - though a font may support a given code set or code page. For example, both Courier New and Times Roman fonts may be used to display CP1252 and they look different on the screen, even though the data on disk may be the same.

The first 256 code points of Unicode are the same as the code points of ISO 8859-1. In ISO 8859-1, code point 252 (0xFC) is LATIN SMALL LETTER U WITH DIAERESIS (colloquially, u-with-umlaut, or 'ü').

There are code set conversion functions; the ICU supports some. There are Windows-specific code set converters to, I have no doubt; I just don't know what their names are. It will depend, in part, on which language(s) you are using.

Here is a must-read explanation of Unicode and Characters Sets (including code pages) from Joel Spolsky

A windows code page is a means for translating an 8 bit value to a character. Most Windows computers in the US use Windows-1252.

Newer Windows programs typically use UTF-8 to store text files and internally use wide strings which are UTF-16. This eliminates code page issues, so a text file written in Hungary will look the same when opened in the US.

继续阅读：character-encoding definition internationalization windows

What are Windows code pages?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？