开发者

Non-latin-characters ordering in database with "order by"

I just found some strange behavior of database's "order by" clause. In string comparison, I expected some characters such as '[' and '_' are greater than latin characters/digits such as 'I' or '2' considering their orders in the ASCII table. However, the sorting results from database's "order by" clause is different with my expectation. Here's my test:

SQLite version 3.6.23

Enter ".help" for instructions

Enter SQL statements terminated with a ";"

sqlite> create table products(name varchar(10));

sqlite> insert into products values('ipod');

sqlite> insert into products values('iphone');

sqlite> insert into products values('[apple]');

sqlite> insert into products values('_ipad');

sqlite> sel开发者_Go百科ect * from products order by name asc;

[apple]

_ipad

iphone

ipod

select * from products order by name asc;
name
...
[B@
_ref
123
1ab
...

This behavior is different from Java's string comparison (which cost me some time to find this issue). I can verify this in both SQLite 3.6.23 and Microsoft SQL Server 2005. I did some web search but cannot find any related documentation. Could someone shed me some light on it? Is it a SQL standard? Where can I find some information about this? Thanks in advance.


The concept of comparing and ordering the characters in a database is called collation.

How the strings are stored depends on the collation which is usually set in the server, client or session properties.

In MySQL:

SELECT  *
FROM    (
        SELECT  'a' AS str
        UNION ALL
        SELECT  'A' AS str
        UNION ALL
        SELECT  'b' AS str
        UNION ALL
        SELECT  'B' AS str
        ) q
ORDER BY
        str COLLATE UTF8_BIN


--
'A'
'B'
'a'
'b'

and

SELECT  *
FROM    (
        SELECT  'a' AS str
        UNION ALL
        SELECT  'A' AS str
        UNION ALL
        SELECT  'b' AS str
        UNION ALL
        SELECT  'B' AS str
        ) q
ORDER BY
        str COLLATE UTF8_GENERAL_CI


--
'a'
'A'
'b'
'B'

UTF8_BIN sorts characters according to their unicode. Caps have lower unicodes and therefore go first.

UTF8_GENERAL_CI sorts characters according to their alphabetical position, disregarding case.

Collation is also important for indexes, since the indexes rely heavily on sorting and comparison rules.


The important keyword in this case is 'collation'. I have no experience with SQLite, but would expect it to be similar to other database engines in that you can define the collation to use for whole databases, single tables, per connection, etc.

Check your DB documentation for the options available to you.


The ASCII codes for lower-case characters such as 'i' are greater than the ones for '[' and '_':

'i': 105
'[': 91
'_': 95

However, try to insert upper-case characters, eg. try with "IPOD" or "Iphone", those will become before "_" and "[" with the default binary collation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜