What's the difference between Encoding.GetEncoding(1255) and Encoding.GetEncoding(1252)?
I have a C# form based program and have been using
System.Text.Encoding.GetEncoding(1252)
but I've had trouble reading non-English char开发者_StackOverflow中文版acters, I've discovered
System.Text.Encoding.GetEncoding(1255)
works however I don't know the implications of changing this so I'm hoping someone can shed some light on the difference and possible implications.
I recommend that you read Joel Spolsky's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
When you use GetEncoding(1252), you're specifying the Windows-1252 Encoding, which specifies a latin alphabet for Western Europe. GetEncoding(1255) is the Windows-1255 encoding, which is used to write Hebrew.
Character encoding 1255 includes Hebrew symbols whereas 1252 is geared towards Western Languages. Is it the case that the non-English symbols happen to be Hebrew?
1252 is Windows-1252 Western European (Windows)
1255 is Windows-1255 Hebrew (Windows)
source: http://msdn.microsoft.com/en-us/library/system.text.encodinginfo.codepage.aspx
Your encoding should always match the one that was used to create the file. If there is no metadata (or person) available to guide this selection, then the only thing to do would be to try each one and see which is legible. Since this is apparently in a language that you don't know, you may need to ask someone who speaks the language if it's legible. Do you know anyone who can read Hebrew?
You probably want to use one of the "named" Unicode encodings, eg., Encoding.UTF8
. But, to answer your question - page 1252 is "Western European (Windows)" and 1255 is "Hebrew (Windows)".
If you're not aware, code pages are pretty much a relic of ASCII and you should try to stick with Unicode where possible.
精彩评论