开发者

problem with encoding.utf8.getbytes in c#

I am working on C#, trying below code

byte[] buffer = new byte[str.Length];
buffer = Encoding.开发者_如何学运维UTF8.GetBytes(str);

In str I've got lengthy data but I've got problem in getting complete encoded bytes. Please tell me what's going wrong and how can I overcome this problem?


Why are you creating a new byte array and then ignoring it? The value of buffer before the call to GetBytes is being replaced with a reference to a new byte array returned by GetBytes.

However, you shouldn't expect the UTF-8 encoded version of a string to be the same length in bytes as the original string's length in characters, unless it's all ASCII. Any character over U+007F takes up at least 2 bytes.

What's the bigger picture here? What are you trying to achieve, and why does the length of the byte array matter to you?


The proper use is:

 byte[] buffer = Encoding.UTF8.GetBytes(str);


In general, you should not make any assumptions about length/size/count when working with encoding, bytes and chars/strings. Let the Encoding objects do their work and then query the resulting objects for that info.

Having said that, I don't believe there is an inherent length restriction for the encoding classes. I have several production apps doing the same work in the opposite direction (bytes encoded to chars) which are processing byte arrays in the 10s of megabytes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜