开发者

Sorting algorithm in OpenOffice calc

I'm having a really long day, the culmination of which is a dumb moment trying to sort a list of string开发者_StackOverflow社区s. Calc sorts them like this:

0DCv6UlY6T0
0ITZEBZrwMk
1062VEX2EfI
2jk7hilGMs0
2lZVu3haI6A
3f8s3KbFQ0Q
3hB09daYLmk
43Erj3qFxxo
6lj33w3YoOw
7jiNQnkfx0k
7TSMj6g3UoE
7Wba8IUk6v8
9hbG9dS7zl0
ALThJiGFBSc
by_VzOiPhZM
Ce250P1xep0
Cgx6DV6RJg8
d5dDgLRd1-o
DnyzZwaYDXE
dO5KLh2er4E

This isn't quite what I expected. Look at the last 3 values. Shouldn't the entry starting with a capital D come before the ones starting with lowercase d (or the other way around)? Why does it come between the lowercase d entries?

Funnily, command line sort in Linux does things the same way. Can somebody explain the logic behind such sorting? I need to replicate it (or reproduce it in Python, if it's already implemented somewhere).


It's because of locale. See the difference between:

sort inputfile

and with (what you probably want):

LANG="C" sort inputfile

output of second command:

0DCv6UlY6T0
0ITZEBZrwMk
1062VEX2EfI
2jk7hilGMs0
2lZVu3haI6A
3f8s3KbFQ0Q
3hB09daYLmk
43Erj3qFxxo
6lj33w3YoOw
7TSMj6g3UoE
7Wba8IUk6v8
7jiNQnkfx0k
9hbG9dS7zl0
ALThJiGFBSc
Ce250P1xep0
Cgx6DV6RJg8
DnyzZwaYDXE
by_VzOiPhZM
d5dDgLRd1-o
dO5KLh2er4E


Whether capitals are lexicographically distinct from lower-case letters depends on the locale (specifically LC_COLLATE), which explains the command line sort program (and ls and ...), and presumably also Openoffice.

E.g.


$ cat test 
Abc
aabc
$ sort test 
aabc
Abc
$ LC_COLLATE=C sort test 
Abc
aabc


For replication:

   data = [  "abc", "aBB", "abD", "Aac", "AAb", "ABc", "ABa" ]
   print sorted(data, key = lambda item: item.upper())

The trick is to provide the key argument. This function is applied to the list items, and the result is used for comparisions during the sort.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜