开发者

Check which of two variants is the traditional and which the simplified Chinese

I'm getting inconsistent results from Google maps api,

|Head southwest on 吳江路/吴江路 toward 泰兴路/泰興路 
|Head southwest on TRAD/SIMP toward SIMP/TRAD

Curre开发者_Go百科ntly I am matching Chinese words with this regex ([^\u0000-\u0080]|/)+

Then I explode the matches and have pairs 吳江路 vs 吴江路, removing the common characters, is there a way to tell which of and is the traditional or simplified character?


You need a traditional->simplified mapping table for Unicode. Google it and you'll find one easily. If you can't find one, then you can make one by downloading a Big5->GB mapping table, then converting both sides to Unicode (via Big5->Unicode and GB->Unicode mapping tables, which are readily available).

If you find a character in the "simplified" section, then it is most likely a simplified character (since a traditional character maps to this).

Note that this is not a scientific method, as multiple traditional characters may map to a single simplified character, and that simplified character may be identical to a traditional character. In this case, you'll need to decide whether you'll call it traditional or not.

For example, 後 is sometimes mapped to 后 in simplified, but it is also identical to the traditional character for "queen".

If you are just mapping pairs of characters, you can try to find conversions in both directions. At most you'll find one conversion in one direction, and that's your answer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜