Does an nvarchar always store each character in two bytes?

2023-02-05 06:25 问答作者：

I had (perhaps naively) assumed that in SQL Server, an nvarchar would stor开发者_Python百科e each character in two bytes. But this does not always seem to be the case. The documentation out there suggests that some characters might take more bytes. Does someone have a definitive answer?

yes it does it uses 2 bytes, use datalength to get the storage size, you can't use LEN because LEN just counts the characters, see here: The differences between LEN and DATALENGTH in SQL Server

DECLARE @n NVARCHAR(10)
DECLARE @v VARCHAR(10)

SELECT @n = 'A', @v='A'

SELECT  DATALENGTH(@n),DATALENGTH(@v)

---------
2 1

Here is what Books On Line has: http://msdn.microsoft.com/en-us/library/ms186939.aspx

Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

nchar [ ( n ) ]

Fixed-length Unicode character data of n characters. n must be a value from 1 through 4,000. The storage size is two times n bytes. The ISO synonyms for nchar are national char and national character.

nvarchar [ ( n | max ) ]

Variable-length Unicode character data. n can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size, in bytes, is two times the number of characters entered + 2 bytes. The data entered can be 0 characters in length. The ISO synonyms for nvarchar are national char varying and national character varying.

That said unicode compression was introduced in SQL Server 2008 R2 so it might store ascii as 1 byte, you can read about unicode compression here

SQL Server 2008 R2 : A quick experiment in Unicode Compression
SQL Server 2008 R2 : Digging deeper into Unicode compression
More testing of Unicode Compression in SQL Server 2008 R2

Given that there are more than 65536 characters, it should be obvious that a character cannot possibly fit in just two octets (i.e. 16 bits).

SQL Server, like most of Microsoft's products (Windows, .NET, NTFS, …) uses UTF-16 to store text, in which a character takes up either two or four octets, although as @SQLMenace points out, current versions of SQL Server use compression to reduce that.

My understanding of this issue is that SQL server uses UCS-2 internally, but that its UCS-2 implementation has been hacked to support a subset of characters of up to 4 bytes in the GB18030 character set, which are stored as UCS-2 but are transparently converted by the database engine back to multibyte characters when queried.

Surrogate/supplementary characters aren't fully supported - the implementation of a number of SQL server string functions doesn't support surrogate pairs, as detailed here.

继续阅读：sql-server unicode

Does an nvarchar always store each character in two bytes?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？