Map Unicode characters to similar EBCDIC 1047 characters automatically
I'm trying to encode a string that has characters not supported by the target encoding (CP 1047).
Is there 开发者_开发百科a standard/common/easy way of mapping those characters to a cp1047 equivalent?
For example, the text has a fancy double quote character (”
) and I want to convert it to the straight double quote ("
).
Obviously I could just do the replace in my code, but is their a better way? Is there an open source tool, or API out there that I don't know about?
If you want to encode Unicode characters in EBCDIC (CP 1047), then (apparently) there's UTF-EBCDIC (though I don't know of any existing tools that can convert to that).
Alternatively, I would look into using the non-standard form of Percent-encoding or XML/HTML encoding. Either one of these two encodings would probably have existing tools for encoding (such as Commons Lang StringEscapeUtils).
Finally, if you just want to 'map' extended characters into the CP 1047 space then I guess you're left with scanning the source string character by character and building the result string from a Map<Char, Char>
(or Map<Char, String>
), so long as you know beforehand all the extended characters you have to deal with and their desired equivalents/replacements.
精彩评论