开发者

.NET flaw with string to byte[] conversion?

I experienced an issue retrieving encrypted data from an NVARCHAR field in our SQL Server (2008R2) database where it appears that for some records, the string value of the data in my C# .NET application is different to that in the database record. This was kind of hard to prove but i eventually found that by looking at the byte[] representation of the strings that there were indeed differences.

Playing around further I was able to produce this test application that has me a little concerned. I took a byte array (converted from hex for simplicity of setup), converted it to a string with the Unicode encoder and back to a byte array and saw that the resulting byte array was different to the original one! In the below code, the first hex string fails while the second works.

Is there something wrong with my method here (and i don't mean by trying to convert byte arrays to strings) or is there potentially something wrong in the .NET framework?

using System;

namespace ByteArrayTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WindowWidth = 80;
            Console.Clear();

            foreach (string s in new string[]
                {
                    "00CD6C8300C2A2C09B9E6B1F258F7B1101000000AB4CB23EBE32F0DD",
                    "00CD6C8300C2A2C09B9E6B1F258F7B1101000000E12617F83C3F7F6A"
                }
            )
            {
                byte[] b1 = System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary.Parse(s开发者_运维知识库).Value;
                string tmp = System.Text.Encoding.Unicode.GetString(b1);

                byte[] b2 = System.Text.Encoding.Unicode.GetBytes(tmp);

                Console.WriteLine("Orig: {0}", s);

                string s2 = BitConverter.ToString(b2).Replace("-", "");
                Console.WriteLine("Conv: {0}", s2);

                Console.WriteLine(s == s2 ? "EQUAL :-)" : "** NOT EQUAL **");
                Console.WriteLine();
            }

            Console.WriteLine("Press ENTER to exit...");
            Console.ReadLine();
        }
    }
}

I'm using VS2010 and tested this under .NET frameworks 4 and 3.5 and the results of this are:

Orig: 00CD6C8300C2A2C09B9E6B1F258F7B1101000000AB4CB23EBE32F0DD
Conv: 00CD6C8300C2A2C09B9E6B1F258F7B1101000000AB4CB23EBE32FDFF
** NOT EQUAL **

Orig: 00CD6C8300C2A2C09B9E6B1F258F7B1101000000E12617F83C3F7F6A
Conv: 00CD6C8300C2A2C09B9E6B1F258F7B1101000000E12617F83C3F7F6A
EQUAL :-)

Regards,


If you're trying to store arbitrary opaque binary data which isn't really text in an NVARCHAR field, you should use base64 encoding to encode it. Trying to just treat it as text encoding in UTF-16 (which is what you're doing here) is a fundamentally bad idea, and very likely to lose data. As one example of where this could happen, you could end up with a string which contains half of a surrogate pair without the other half.

I assume your "encrypted data" was stored by just calling Encoding.Unicode.GetString(bytes) where bytes is the encrypted data? If so, that's definitely not the way to go. Use:

string text = Convert.ToBase64String(bytes);

instead, and when retrieving the data, use

byte[] bytes = Convert.FromBase64String(text);

Alternatively, use a database field which is designed for binary data in the first place.

EDIT: (Copied from my comment) The example you've given is failing at the end, converting U+DDF0 to U+FFFD. That's actually exactly the scenario I mentioned above - U+DDF0 is a "low surrogate", but it doesn't have a corresponding "high surrogate", so Encoding.GetString is converted that character into U+FFFD, which is the "replacement character", which is (from the Unicode chart)

used to replace an incoming character whose value is unknown or unrepresentable in Unicode

IIRC, you can specify what Encoding does when it encounters bad binary data (which is effectively what you're giving it) and potentially make it throw an exception instead.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜