开发者

It sould be so obvious, but why does this fail?

Been coding .net for years now yet I feel like a n00b. Why is the following code failing?

byte[] a = Guid.NewGuid().ToByteArray(); // 16 bytes in array
string b = new UTF8Encoding().GetString(a);
byte[] c = new UTF8Encoding().GetBytes(b);
Guid d = new Guid(c);    // Throws exception (32 bytes recived from c)

Update

Approved the answer from CodeInChaos. Reason for the 16 bytes becomming 32 bytes can be read in his answer. Also stated in the answer:

the default constructor of UTF8Encoding has error checking disabled

IMHO the UTF8 encoder should throw exception when trying to encode a byte array to string that contains invalid bytes. To make the .net framework behave properly the code should have been written as follows

 byte[] a = Guid.NewGuid().ToByteArray();
 string b = new UTF8Encoding(false, true).GetString(a);  // Throws exception as expected
 byte[] 开发者_运维技巧c = new UTF8Encoding(false, true).GetBytes(b);
 Guid d = new Guid(c);


Not every sequence of bytes is a valid UTF-8 encoded string.

The GUID can contain almost any sequence of bytes. But UTF-8 as specific rules for which byte sequences are allowed if the value is >127. And a Guid will quite often not follow these rules.

Then when you encode the corrupted string back to a byte array you get a byte array longer than 16 bytes, which the constructor of Guid doesn't accept.


The documentation on UTF8Encoding.GetString states:

With error detection, an invalid sequence causes this method to throw a ArgumentException. Without error detection, invalid sequences are ignored, and no exception is thrown.

and the default constructor of UTF8Encoding has error checking disabled(don't ask me why).

This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
Note
For security reasons, your applications are recommended to enable error detection by using the constructor that accepts a throwOnInvalidBytes parameter and setting that parameter to true.


You might want to use Base64 encoding instead of UTF-8. That way you can map any valid byte sequence into a string and back.


To encode arbitrary byte data as a string you should use base-64, hex, etc. You cannot assume that random set of bytes makes a valid UTF* (or other encoding) string.

http://marcgravell.blogspot.com/2010/03/binary-data-and-strings.html


Because var b is type string, which means it's a unicode string (2 bytes per character). In your second line, you're creating a 16-character string out of a 16-byte array, but that 16-character string is stored in 32 bytes.

Why not just do this:

var d = Guid.NewGuid();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜