Non-latin-characters ordering in database with "order by"
I just found some strange behavior of database's "order by" clause. In string comparison, I expected some characters such as '[' and '_' are greater than latin characters/digits such as 'I' or '2' considering their orders in the ASCII table. However, the sorting results from database's "order by" clause is different with my expectation. Here's my test:
SQLite version 3.6.23
select * from products order by name asc;
name
...
[B@
_ref
123
1ab
...
This behavior is different from Java's string comparison (which cost me some time to find this issue). I can verify this in both SQLite 3.6.23 and Microsoft SQL Server 2005. I did some web search but cannot find any related documentation. Could someone shed me some light on it? Is it a SQL standard? Where can I find some information about this? Thanks in advance.
The concept of comparing and ordering the characters in a database is called collation.
How the strings are stored depends on the collation which is usually set in the server, client or session properties.
In MySQL
:
SELECT *
FROM (
SELECT 'a' AS str
UNION ALL
SELECT 'A' AS str
UNION ALL
SELECT 'b' AS str
UNION ALL
SELECT 'B' AS str
) q
ORDER BY
str COLLATE UTF8_BIN
--
'A'
'B'
'a'
'b'
and
SELECT *
FROM (
SELECT 'a' AS str
UNION ALL
SELECT 'A' AS str
UNION ALL
SELECT 'b' AS str
UNION ALL
SELECT 'B' AS str
) q
ORDER BY
str COLLATE UTF8_GENERAL_CI
--
'a'
'A'
'b'
'B'
UTF8_BIN
sorts characters according to their unicode. Caps have lower unicodes and therefore go first.
UTF8_GENERAL_CI
sorts characters according to their alphabetical position, disregarding case.
Collation is also important for indexes, since the indexes rely heavily on sorting and comparison rules.
The important keyword in this case is 'collation'. I have no experience with SQLite, but would expect it to be similar to other database engines in that you can define the collation to use for whole databases, single tables, per connection, etc.
Check your DB documentation for the options available to you.
The ASCII codes for lower-case characters such as 'i' are greater than the ones for '[' and '_':
'i': 105
'[': 91
'_': 95
However, try to insert upper-case characters, eg. try with "IPOD" or "Iphone", those will become before "_" and "[" with the default binary collation.
精彩评论