开发者

Is it possible to convert language specific characters to latin characters in UTF8

I am wondering if there are any relationsh开发者_如何学Pythonips or existing algorithms allowing converting from national characters to equivalent Latin characters within the UTF8 codepage?

For example (in Polish):

Ą -> A

Ó -> O

ż -> z

ź -> z ...

phrase like: 'zażółć gęślą jażń'

converts to: 'zazolc gesla jazn'

Currently I am using a conversion array for Polish, but I am looking for a universal solution handling all Latin based languages.

Thanks


Check this:

http://sourceforge.net/projects/iconvnet/

In general, search for something called iconv


To make the answer complete, the 'Unicode decomposition + C#' led me to this CodeProject article (codeproject.com/KB/cs/UnicodeNormalization.aspx?display=Print) which offers a ready to use solution. The ability to name what you are looking for can't be underestimated ;) Thanks for all answers.


Not completely sure that this is a definitive answer that you will need, but when I've had to do this in the past, I've converted all 'special' characters into a named or numerical entity so that they are protected during the conversion process.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜