Unicode surrogates character encoding c#
I've got problem with Unicode characters. When I want to encode surrogates character (between D800
and DFFF
) it encodes as FFFD
. I used Encoding.Unicode.GetString()
method it doesn't work and Decoder.GetChars()
method it doesnt work with every su开发者_Go百科rrogate character.
I use following codes:
Encoding Codes:
string unicodeChars="a\uD800\uDA65";
FileStream stream=new FileStream (@"unicode_encoding.txt",FileMode.Create,FileAccess.Write);
byte[] buffer=Encoding.Unicode.GetBytes(unicodeChars);
stream.Write(buffer,0,buffer.Length);
stream.Close();
Decoding Codes:
string decodedUnicodeChars;
FileStream stream2=new FileStream (@"unicode_encoding.txt",FileMode.Open,FileAccess.Read);
StreamReader reader=new StreamReader(stream2,Encoding.Unicode);
decodedUnicodeChars=reader.ReadToEnd();
foreach(char c in decodedUnicodeChars)
{
Console.Write("{0} ",Convert.ToInt32(c).ToString("X4"));
}
Output is:
0061 FFFD FFFD
string unicodeChars="a\uD800\uD565";
This is a case of gigo, Garbage In, Garbage Out. The surrogate is not valid, the second one must be in the range \uDC00..\uDFFF.
精彩评论