downgrade non-ascii symbols to closest 7-bit ASCII equivalent (preferrably Java)
is there any simple/lightweight solution to change at least some of non-ASCII symbols to respective ASCII analogs? For example this string
abc-åäö.txt
should be changed to
abc-aao.txt
A bit of background: Zip-tools do not reliably support UTF-8, hence the need to downgrade. AFAICR Google "download attachments as single zip file" feature replaces any non-ascii symbols with the '_' character.
PS: the code might as well be in s开发者_开发知识库ome other language, if it's more or less understandable I'll port that to Java. PPS: my first question so far, so please don't minus me below the ground okay?
Have a look at java.text.Normalizer
. It can help you with transforming equivalent characters: http://en.wikipedia.org/wiki/Unicode_equivalence
Maybe this would do?
Looks like the problem is solved here -
[solution][howto] Convert special characters to normal chars (é to e) http://www.ramonfincken.com/permalink/topic192.html
If you would consider using python, there is a pretty good python package called unidecode, which can get the ASCII transliterations of Unicode text.
Okay, found something more or less working in this question: PHP: Replace umlauts with closest 7-bit ASCII aequivalent in an UTF-8 string
精彩评论