开发者

Is there a standard way to sort by a non-english alphabet? For example, the romanian alphabet is "a ă â b c..." [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

How do I sort unicode strings alphabetically in Python?

As a citizen of the Rest-of-the-World, I'm really annoyed by the fact that computers aren't adapted by default to deal with international issues. Many sites still don't use Unicode and PHP is still in the Dark Ages.

When I want to sort a list of words or names in romanian I always have to write my own functions, which are hardly efficient. There must开发者_如何转开发 be some locale setting that makes sort functions obey the alphabet order of the specified language, right?

I'm mainly interested in Python, Java and JavaScript.

EDIT: I found my answer for Python here, as pointed out by Chris Morgan.


In Python, you can always use sorted function with a key parameter. For example, in Turkish, we have letters like 'ç','ı','ş' etc. If I want to sort according to that letter, I would use a key string which letters is sorted, and sort the string according to this, like this:

>>> letters="abcçdefgğhıijklmnoöprsştuüvyz" #Turkish alphabet
>>> sorted("açobzöğge")
['a', 'b', 'e', 'g', 'o', 'z', 'ç', 'ö', 'ğ'] #Python's default
>>> sorted("açobzöğge", key=lambda i: letters.index(i))
['a', 'b', 'ç', 'e', 'g', 'ğ', 'o', 'ö', 'z'] #With key parameter

Note: With Python 3; dealing with Unicode is easier.

Edit, as said by comments, this process would be more efficent if we use a dictionary:

>>> letters="abcçdefgğhıijklmnoöprsştuüvyz"
>>> d={i:letters.index(i) for i in letters}
>>> sorted("açobzöğge", key=d.get)
['a', 'b', 'ç', 'e', 'g', 'ğ', 'o', 'ö', 'z']


There is no single, unified sorting algorithm that's correct for all languages, because many languages have very specific sorting rules.

It goes even further than that: even within a single language, the sorting algorithm can vary depending on what it's used for (for example in German dictionaries are sorted slightly different from phone books).

The entire topic is called Collation. The Wikipedia article on Collating sequence is relevant as well.

There seems to be a project that implements correct collation for many languages called python-collate.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜