Is it possible to convert language specific characters to latin characters in UTF8
I am wondering if there are any relationsh开发者_如何学Pythonips or existing algorithms allowing converting from national characters to equivalent Latin characters within the UTF8 codepage?
For example (in Polish):
Ą -> A
Ó -> O
ż -> z
ź -> z ...
phrase like: 'zażółć gęślą jażń'
converts to: 'zazolc gesla jazn'
Currently I am using a conversion array for Polish, but I am looking for a universal solution handling all Latin based languages.
Thanks
Check this:
http://sourceforge.net/projects/iconvnet/
In general, search for something called iconv
To make the answer complete, the 'Unicode decomposition + C#' led me to this CodeProject article (codeproject.com/KB/cs/UnicodeNormalization.aspx?display=Print) which offers a ready to use solution. The ability to name what you are looking for can't be underestimated ;) Thanks for all answers.
Not completely sure that this is a definitive answer that you will need, but when I've had to do this in the past, I've converted all 'special' characters into a named or numerical entity so that they are protected during the conversion process.
精彩评论