开发者

Storing Multibyte in SQL Server

What should be the data type in SQL Server to store multibyte 开发者_如何学Pythoncharacters ? NVARCHAR or VARCHAR


If you want to store the original MB string, it get dicey. Your best choice is actually varbinary type, since neither varchar nor nvarchar are correct types for MB strings.

In your place I would convert the MB string to Unicode using MultiByteToWideChar and store it as NVARCHAR, and when needed in thr application again I would convert it back from Unicode to MB string using WideCharToMultiByte. This way the data in the database would be coherent for any other use that expects Unicode and/or Ascii strings, there aren't that many users prepared to deal with Multi-Bytes strings stored as byte arrays...


It should either be NCHAR or NVARCHAR... [Edit] ... after you convert these to UCS-2.

In both cases this stores Unicode characters (using UCS-2 characterset). Use the NCHAR for fields with a fixed length (fixed number of characters) and NVARCHAR for variable length fields.

The advantage of using UCS-2 is that all supporting logic is easier, since the characters all require exactly two bytes for storage. The disadvantage is that a lot of space gets wasted if most of the text uses mostly ASCII characters.

Thank you, Remus Rusanu, for pointing out that the Multi-byte strings are NOT directly compatible with Unicode UCS sets. If indeed you have say UTF-8 or UTF-16, it will not be stored or handle properly by SQL server before it is converted to UCS-2.

If one is only trying to store this data, without searching by it etc. VARBINARY format is indeed a good thing. Otherwise, one needs to convert as I suggest. (UCS-2 should handle most,but alas, not all, character sets out there).


Its possible to store UTF-8 in varchar fields if you really need to but there are a whole host of caveats.

  • No off-the-shelf data tools would be able to properly display the data without explicit conversions.

  • All character set translation MUST be disabled in the data drivers.

  • System must not accept truncated data unless it understands boundaries.

  • Sort operations may not be correct for all alphabets.

It works if you have no other choice and need to support an encoding like UTF8 without re-engineering your application and schema. Works quite well for web applications where input and output are browser based. Having said all this I'd still recommend going NVARCHAR and convert to UCS2.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜