Silverlight UTF8 encoder produces wacky output
I've been trying to trace down a bug for hours now and it has come down to this:
Dim length as Integer = 300
Dim buffer() As Byte = binaryReader.ReadBytes(length)
Dim text As String = System.Text.Encoding.UTF8.GetString(buffer, 0, buffer.Length)
The problem is the buffer contains 300 bytes but the length of the string 'text' is now 285. When I convert it back to bytes, the length is 521 bytes... WTF?
The same code is a normal WinForms app works perfectly. The data being read by the binary reader is a UTF8 encoded str开发者_JS百科ing. Any ideas why Silverlight is playing funny buggers?
I bet your stream contains some characters that require more than one byte. UTF8 uses a single byte when possible, but uses more bytes when the character is outside the ASCII range.
This explains why your buffer is longer than the string (300 vs 285).
Example:
string: "t e s t ä " (length = 5 -last char takes 2 bytes)
bytes: 0x74 | 0x65 | 0x73 | 0x74 | 0xc3 0xa4 (length = 6)
As to why it becomes even longer when you convert the text back to bytes, my best guess (also looking at the 521 size you get) is that you are using Encoding.Unicode instead of Encoding.UTF8 to perform the conversion. Unicode always uses two bytes for each character.
(btw. obviously this has nothing to do with Silverlight. You are probably testing the code with two different strings in Winforms vs. Silverlight. No worry, we've all done stupid mistakes like that :-) )
精彩评论