开发者

Sorting Katakana names

If I have a list of Katakana names what is the best way to sort them? Also is it more common to sort names based on their {first name}{last name} or {last name}{first name}. Another qu开发者_开发知识库estion is how do we get the first character Hiragana representation of a Katakana name like how it is done for the iPhone's contact list is sorted.? Thanks.


In Japan it is common (if not expected) that a person's first name appear after their surname when written: {last} {first}. But this would also depend on the context. In a less formal context it would be acceptable for a name to appear {first} {last}.

http://en.wikipedia.org/wiki/Japanese_name

Not that it matters, but why would the names of individuals be written in Katakana and not in the traditional Kanji?


I think it's

sort($array,SORT_LOCALE_STRING);

Provide more information if it's not your case


This answer talks about using the system locale to sort Unicode strings in PHP. Besides threading issues, it is also dependent on your vendor having supplied you with a correct locale for what you want to use. I’ve had so much trouble with that particular issue that I’ve given up using vendor locales altogether.

If you’re worried about different pronunciations of Unihan ideographs, then you probably need access to the Unihan database — or its moral equivalent. A smaller subset may suffice.

For example, I know that in Perl, the JIS X 0208 standard is used when the Japanese "ja" locale for is selected in the constructor for Unicode::Collate::Locale. This doesn’t depend on the system locale, so you can rely on it.

I’ve also had good luck in Perl with Lingua::JA::Romanize::Japanese, as that’s somewhat friendlier to use than accessing Unicode::Unihan directly.

Back to PHP. This article observes that you can’t get PHP to sort Japanese correctly.

I’ve taken his set of strings and run it through Perl’s sort, and I indeed get a different answer than he gets. If I use the default or English locale, I get in Perl what he gets in PHP. But if I use the Japanese locale for the collation module — which has nothing to do with the system locale and is completely thread-safe — then I get a rather different result. Watch:

JA Sort                          = EN Sort
------------------------------------------------------------
Java                               Java
NVIDIA                             NVIDIA
Windows ファイウォール             Windows ファイウォール
インターネット オプション          インターネット オプション
キーボード                         キーボード
システム                           システム
タスク                             タスク
フォント                           フォント
プログラムの追加と削除             プログラムの追加と削除
マウス                             マウス
メール                             メール
音声認識                         ! 地域と言語オプション
画面                             ! 日付と時刻
管理ツール                       ! 画面
自動更新                         ! 管理ツール
地域と言語オプション             ! 自動更新
電源オプション                     電源オプション
電話とモデムのオプション           電話とモデムのオプション
日付と時刻                       ! 音声認識

I don’t know whether this will help you at all, because I don’t know how to get at the Perl bits from PHP (can you?), but here is the program that generates that. It uses a couple of non-standard modules installed from CPAN to do its business.

#!/usr/bin/env perl
#
# jsort - demo showing how Perl sorts Japanese in a 
#          different way than PHP does.
#
# Data taken from http://www.localizingjapan.com/blog/2011/02/13/sorting-in-japanese-—-an-unsolved-problem/
#
# Program by Tom Christiansen <tchrist@perl.com>
# Saturday, April 9th, 2011

use utf8;
use 5.10.1;
use strict;
use autodie;
use warnings;
use open qw[ :std :utf8 ];

use Unicode::Collate::Locale;
use Unicode::GCString;

binmode(DATA, ":utf8");

my @data = <DATA>;
chomp @data;

my $ja_sorter = new Unicode::Collate::Locale locale => "ja";
my $en_sorter = new Unicode::Collate::Locale locale => "en";

my @en_data = $en_sorter->sort(@data);
my @ja_data = $ja_sorter->sort(@data);

my $gap = 8;
my $width = 0;
for my $datum (@data) {
    my $columns = width($datum);
    $width = $columns if $columns > $width;
}
my $bar = "-" x ( 2 + 2 * $width + $gap );
$width = -($width + $gap);
say justify($width => "JA Sort"), "= ", "EN Sort";
say $bar;

for my $i ( 0 .. $#data ) {
    my $same = $ja_data[$i] eq $en_data[$i] ? " " : "!";
    say justify($width => $ja_data[$i]), $same, " ", $en_data[$i];
}

sub justify {
    my($len, $str) = @_;
    my $alen = abs($len);
    my $cols = width($str);

    my $spacing = ($alen > $cols) && " " x ($alen - $cols);

    return ($len < 0)
        ? $str . $spacing
        : $spacing . $str

}

sub width {
    return 0 unless @_;
    my $str = shift();
    return 0 unless length $str;
    return Unicode::GCString->new($str)->columns;
}


__END__
システム
画面
Windows ファイウォール
インターネット オプション
キーボード
メール
音声認識
管理ツール
自動更新
日付と時刻
タスク
プログラムの追加と削除
フォント
電源オプション
マウス
地域と言語オプション
電話とモデムのオプション
Java
NVIDIA

Hope this helps. It shows that it is, at least theoretically, possible.


EDIT

This answer from How can I use Perl libraries from PHP? references this PHP package to do that for you. So if you don’t find a PHP library with the needed Japanese sorting stuff, you should be able to use the Perl module. The only one you need is Unicode::Collate::Locale. It comes standard as of release 5.14 (really 5.13.4, but that’s a devel version), but you can always install it from CPAN if you have an earlier version of Perl.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜