Range wildcard pattern matching behaviour with case-sensitive collations
Using PATINDEX and a case-sensitive collation to search for upper-case letters in a string I noticed this was not yielding the desired result:
-- returns 1
SELECT PATINDEX('%[A-Z]%'开发者_高级运维
, 'abCde' COLLATE SQL_Latin1_General_Cp1_CS_AS);
however, specifying every letter, A-Z, does:
-- returns 3
SELECT PATINDEX('%[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%'
, 'abCde' COLLATE SQL_Latin1_General_Cp1_CS_AS);
Is my understanding of using a range in the first case incorrect? Why is the behaviour like this?
Unfortunately, the range operators are a bit funny. The range of letters from A-Z is:
AbBcCdDeE...yYzZ
That is, lower case characters immediately precede their upper case counterpart. This is also fun because if you want to deal with both upper and lower case characters, in a case sensitive collation, the range A-Z excludes lower case a.
I should say the above, regarding how the range expands out, is based on the collations I generally work with. How the range actually expands is collation dependent. If you can find a collation where, for instance, all upper case characters occur before all lower case characters, then the range would work as you expect. (Possibly one of the binary collations?)
精彩评论