开发者

Is there a way to tell if a unicode character is a control, alpha, numeric or symbolic?

Assuming all you have is the binary data and no pre-canned functions, is there a pat开发者_运维技巧tern or algorithm to categorize the type of character?


You ask an API to tell you. In Java, you use the Character class. In C++, you can use ICU. If your language doesn't have this, you download the properties database from unicode.org and incorporate it.

In other words, there is no pattern or algorithm. There are tables published by the Unicode consortium that contain the information.


No, there's no pattern. You will need to create some look-up-tables. (Well, I suppose you could do it with a maze of if​s but it wouldn't be nice.)

Luckily in most environments there is a pre-canned API function to do it for you, because building the character class data tables is super-boring.


I have recently publishmy FOSS Unicode Converter and I'm using from Latest Unicode Character Database (Annex #44 - that contain Unicode 5.2)

in this (XML) database youcan search for your requested Character (Hex Code) and see if it is numeric or whatever you want.

you can test this atmy project and if it was usefull you can use its database

http://unicode.codeplex.com is the main repository for the project. you can just see the code or get the executable there

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜