Utf7Encoding Text truncation
I was having an issue with the Utf7Encoding class truncating the '+4' sequence. I would be very interested to know why this was happening. I tried Utf8Encoding for getting string from the byte[] array and it seem to work honky dory. Are there any known issues like that with Utf8? Essentially I use the output produced by this conversion to construct html out of rtf string.
Here is the snippet:
UTF7Encoding utf = new UTF7Encoding();
UTF8Encoding utf8 = new UTF8Encoding();
string test = "blah blah 9+4";
char[] chars = test.ToCharArray();
byte[] charBytes = new byte[chars.Length];
for (int i = 0; i < chars.Length; i++)
{
charBytes[i] = (byte)chars[i]开发者_如何转开发;
}
string resultString = utf8.GetString(charBytes);
string resultStringWrong = utf.GetString(charBytes);
Console.WriteLine(resultString); //blah blah 9+4
Console.WriteLine(resultStringWrong); //blah 9
Converting to byte array through char array like that does not work. If you want the strings as charset-specific byte[]
do this:
UTF7Encoding utf = new UTF7Encoding();
UTF8Encoding utf8 = new UTF8Encoding();
string test = "blah blah 9+4";
byte[] utfBytes = utf.GetBytes(test);
byte[] utf8Bytes = utf8.GetBytes(test);
string utfString = utf.GetString(utfBytes);
string utf8String = utf8.GetString(utf8Bytes);
Console.WriteLine(utfString);
Console.WriteLine(utf8String);
Output:
blah blah 9+4
blah blah 9+4
Your are not transating the string to utf7 bytes correctly. You should call utf.GetBytes()
instead of casting the characters to a byte.
I suspect that in utf7 the ascii code corresponding to '+' is actually reserved for encoding international unicode characters.
精彩评论