problem with IN and LIKE in SQL when arguments contain special characters for a language
I’ve got such a KeyWord table (MS SQL):
- KeyGuid Qualifier PrimitiveKey
- DA7E4E27-FDE5-4D43-A365-8A789164A816 tit kirkäna
- EED58875-FE41-4A18-A93C-A44AA62CEEEE htit kirkänbh 开发者_Go百科
- A0EB795E-EE23-4990-BAB9-897C93C70CE3 htit kirkänah
- F7F4632B-AC82-4DEB-B966-BBA8EF4D2C9E tit kirkänb
- C0EB795E-EE23-4990-BAB9-897C93C70CE3 nam kirkänas
- E2F4632B-AC82-4DEB-B966-BBA8EF4D2C9E nam kirkänbs
- A222795E-EE23-4990-BAB9-897C93C70CE3 tit kirkacb
- B333632B-AC82-4DEB-B966-BBA8EF4D2C9E tit kirkaca
- 1222795E-EE23-4990-BAB9-897C93C70C81 htit kirkacbh
- E533632B-AC82-4DEB-B966-BBA8EF4D2C82 htit kirkacah
This simplest query properly returns all relevant records:
select * from KeyWord where PrimitiveKey like 'kirkän%'
- DA7E4E27-FDE5-4D43-A365-8A789164A816 tit kirkäna
- EED58875-FE41-4A18-A93C-A44AA62CEEEE htit kirkänbh
- A0EB795E-EE23-4990-BAB9-897C93C70CE3 htit kirkänah
- F7F4632B-AC82-4DEB-B966-BBA8EF4D2C9E tit kirkänb
- C0EB795E-EE23-4990-BAB9-897C93C70CE3 nam kirkänas
- E2F4632B-AC82-4DEB-B966-BBA8EF4D2C9E nam kirkänbs
I use such a query to limit the resutls to match specific qualifiers:
select * from KeyWord where Qualifier IN ('tit', 'htit') and PrimitiveKey Like 'kirkac%'
which works perfectly:
- A222795E-EE23-4990-BAB9-897C93C70CE3 tit kirkacb
- B333632B-AC82-4DEB-B966-BBA8EF4D2C9E tit kirkaca
- 1222795E-EE23-4990-BAB9-897C93C70C81 htit kirkacbh
- E533632B-AC82-4DEB-B966-BBA8EF4D2C82 htit kirkacah
However when the phrase contains a special character like ä it does not return results:
select * from KeyWord where Qualifier IN ('tit', 'htit') and PrimitiveKey Like 'kirkän%'
nor does it with the qualifiers limited like this:
select * from KeyWord where (Qualifier = 'tit' OR Qualifier = 'htit') and PrimitiveKey Like 'kirkän%'
However it does work like this:
select * from KeyWord where (Qualifier like 'tit' OR Qualifier like 'htit') PrimitiveKey Like 'kirkän%'
- DA7E4E27-FDE5-4D43-A365-8A789164A816 tit kirkäna
- EED58875-FE41-4A18-A93C-A44AA62CEEEE htit kirkänbh
- A0EB795E-EE23-4990-BAB9-897C93C70CE3 htit kirkänah
- F7F4632B-AC82-4DEB-B966-BBA8EF4D2C9E tit kirkänb
What is wrong with the IN approach?
Maybe you need to use unicode compatible datatypes. Declaring the PrimitiveKey column as nvarchar, try prefixing the string you want to match with an 'N' like this: select * from KeyWord where (Qualifier like 'tit' OR Qualifier like 'htit') and PrimitiveKey Like N'kirkän%' .
take a look at http://msdn.microsoft.com/en-us/library/ms179886.aspx
basically LIKE operand has it:s own collation that overrides the server and column settings. However i haven't been able to figure out where or if there is a way to change this setting. The above article is a pretty tough read, but i think the most detailed explanation is at the bottom.
This is probably best handled with a specific collation setting that copes with the language-specific characters.
Here's an article on SQL Server collation that may help: http://msdn.microsoft.com/en-us/library/aa174903(SQL.80).aspx
I think you should take a look at Full-Text Search. I know changing the collation would help here, as Neil says, but you might get some benefit out of using FTS depending on how scalable your implementation needs to be.
I have done more investigation on the problem. Here's what I found.
A. The problematic query actually returns results but containing 'ae' only:
select * from KeyWord where Qualifier IN ('tit', 'htit') and PrimitiveKey Like 'kirkän%'
returns for example 'kirkaeni'.
B. If another % is included in the query (for example: Like 'ki%rkän%') the results include those expected! (this is weird) (but also those not wanted, matching the other %, of course).
C. I have tried to reproduce the problem - creating a simple DB only with two tables (the one with 'kirk' stuff has a foreign key to the other one), I have used queries creating the problematic DB as well those creating tables, I have set the same collation (German_PhoneBook_CI_AI) + I have created indexes as in the problematic DB. However the problem did not occur, so I cannot yet actually reproduce it.
Any new ideas with these symptoms?
精彩评论