Unicode characters causing issues in SQL Server 2005 string comparison

2022-12-20 03:23 问答作者：

This query:

select *
from op.tag
where tag = 'fussball'

Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).

While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes fa开发者_高级运维ilure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?

Follow-up bonus point question. Explain why this query does not return any rows:

select 1 
where 'fußball' = 'fussball'

Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:

select 1 
where 'fußball' = cast('fussball' as nvarchar)

But then again, this one does not:

select 1 
where cast('fußball' as varchar) = cast('fussball' as varchar)

I'm confused.

I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it's on a faulty fast path, or maybe it does a binary comparison, or maybe you're not passing in the ß in the right encoding (I agree it's stupid).

http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here's an insightful excerpt:

Language-sensitive searching and matching are closely related to collation. Strings that compare as equal at some strength level are those that should be matched when doing language-sensitive matching. For example, at a primary strength, "ß" would match against "ss" according to the UCA, and "aa" would match "å" in a Danish tailoring of the UCA.

The SELECT does return a row with collation Latin1_General_CI_AS (SQL2000).

It does not with collation Latin1_General_BIN.

You can assign a table column a collation by using the COLLATE < collation > keyword after N/VARCHAR.

You can also compare strings with a specific collation using the syntax

string1 = string2 COLLATE < collation >

This isn't an answer that explains behavior, but may be relevant:

In this question, I learned that using the collation of

Latin1_General_Bin

will avoid most collation quirks.

Some helper answers - not the complete one to your question, but still maybe helpful:

If you try:

SELECT 1 WHERE N'fußball' = N'fussball'

you'll get "1" - when using the "N" to signify Unicode, the two strings are considered the same - why that's the case, I don't know (yet).

To find the default collation for a server, use

SELECT SERVERPROPERTY('Collation')

To find the collation of a given column in a database, use this query:

SELECT
  name 'Column Name', 
  OBJECT_NAME(object_id) 'Table Name', 
  collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name') 
AND name = 'your-column-name'

Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:

select 1 where 'fußball' = cast('fussball' as nvarchar)

Here you're dealing with the SQL Server data type precedence rules, as stated in Data Type Precedence. Comparisons are done always using the higher precedence type:

When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence.

Since nvarchar has a higher precedence than varchar, the comparison in your example will occur suing the nvarchar type, so it's really exactly the same as select 1 where N'fußball' =N'fussball' (ie. using Unicode types). I hope this also makes it clear why your last case doesn't return any row.

继续阅读：collation sql-server string-comparison unicode

Unicode characters causing issues in SQL Server 2005 string comparison

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？