开发者

Why do I get the wrong result when comparing UTF16 strings in a linq-to-sql select?

I am using C# and .Net 4.0 with MS SQL 2008开发者_StackOverflow中文版.

I am running an integration test to verify that data is getting correctly stored and retrieved. It fails more often than not. When I look into it I see that I am getting the wrong value back from the linq-to-sql call. I have profiled the linq-to-sql statement and discovered that in Server Management Studio, the profiled SQL returns the wrong value, while a hand typed query with the same parameters works correctly.

The linq-to-sql query and result:

exec sp_executesql N'SELECT TOP (1) [t0].[ID], [t0].[UserName], [t0].TCID
FROM [dbo].[Users] AS [t0]
WHERE ([t0].[TCID] = @p0) AND ([t0].[UserName] = @p1)',N'@p0 int,@p1
nvarchar(4000)',@p0=8,@p1=N'ҭРӱґѻ'

Results in

ID        UserName    TCID
2535      ҭРґѻӱ       8

As you can see, UserName does not match what was in the equality check.

If I do this, I get the expected result:

SELECT TOP 1000 [ID]
    ,[UserName]
    ,[TCID]
FROM [dbo].[Users]
where TCID=8 and username = 'ҭРӱґѻ'

I get back:

ID        UserName    TCID

Which is correct.

UserName is nvarchar(50), ID and TCID are int.

Any ideas why the first query gets the wrong result?


You're not getting results on the second query because you forgot to prefix the parameter with N. I bet you get a result just like with the dynamic SQL if you use:

SELECT TOP 1000 [ID]
    ,[UserName]
    ,[TCID]
FROM [dbo].[Users]
where TCID=8 and username = N'ҭРӱґѻ'; -- note the N prefix here

Now, I'm not saying you should get a result, but that should make the behavior consistent between your two testing methods. What is the collation of the column? You can "fix" this in a way by specifying a binary collation. For example, this should yield proper behavior:

SELECT COUNT(*) 
  FROM [dbo].[Users]
  WHERE [UserName] = N'ҭРӱґѻ' COLLATE Latin1_General_BIN;

-- 0

SELECT COUNT(*) 
  FROM [dbo].[Users]
  WHERE [UserName] = N'ҭРґѻӱ' COLLATE Latin1_General_BIN;

-- 1


With the collation you are using (probably a SQL Server-specific collation), some Unicode code points are not defined. Thus SQL Server treats them as if they were an empty string:

SELECT CASE WHEN N'ӱ' COLLATE SQL_Latin1_General_CP1_CI_AS = N'' THEN 'YES' ELSE 'NO' END

If we use a newer Windows collation such as Cyrillic_General_100_CI_AS, we see that these strings do not match:

SELECT CASE WHEN N'ӱ' COLLATE Cyrillic_General_100_CI_AS = N'' THEN 'YES' ELSE 'NO' END

Here's a blog post on MSDN that should explain more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜