problem with encoding.utf8.getbytes in c#
I am working on C#, trying below code
byte[] buffer = new byte[str.Length];
buffer = Encoding.开发者_如何学运维UTF8.GetBytes(str);
In str I've got lengthy data but I've got problem in getting complete encoded bytes. Please tell me what's going wrong and how can I overcome this problem?
Why are you creating a new byte array and then ignoring it? The value of buffer
before the call to GetBytes
is being replaced with a reference to a new byte array returned by GetBytes.
However, you shouldn't expect the UTF-8 encoded version of a string to be the same length in bytes as the original string's length in characters, unless it's all ASCII. Any character over U+007F takes up at least 2 bytes.
What's the bigger picture here? What are you trying to achieve, and why does the length of the byte array matter to you?
The proper use is:
byte[] buffer = Encoding.UTF8.GetBytes(str);
In general, you should not make any assumptions about length/size/count when working with encoding, bytes and chars/strings. Let the Encoding objects do their work and then query the resulting objects for that info.
Having said that, I don't believe there is an inherent length restriction for the encoding classes. I have several production apps doing the same work in the opposite direction (bytes encoded to chars) which are processing byte arrays in the 10s of megabytes.
精彩评论