Bug in Apache Commons StringEscapeUtil?
Just started using Apache Commons String开发者_运维问答EscapeUtils
.
According to http://www.w3schools.com/tags/ref_entities.asp, Ö
should correspond to Ö. However,
System.out.println(StringEscapeUtils.unescapeHtml4("Ö"));
prints
×
Is this a bug? Or what am I missing?
I guess EntityArrays.java from the lang3 repository is buggy:
{"\u00D6", "Õ"}, // � - uppercase O, tilde
{"\u00D7", "Ö"}, // � - uppercase O, umlaut
{"\u00D8", "×"}, // multiplication sign
It seems, that some values are shifted by one row. It must be:
{"\u00D6", "Ö"}, // � - uppercase O, umlaut
{"\u00D7", "×"}, // multiplication sign
because Ö
is 00D6
according to LATIN CAPITAL LETTER O WITH DIAERESIS
and x
is "\u00D7"
version 2.5 StringEscapeUtils.unescapeHtml
prints Ö
version 3.0-beta StringEscapeUtils.unescapeHtml3
and StringEscapeUtils.unescapeHtml4
print ×
Generally I'd use the latest stable version (currently 2.5). Looks like a bug but I couldn't find anything useful in https://issues.apache.org/jira/browse/LANG
Perhaps your console cannot show the Ö character. Check the system property file.encoding
to see what the default console encoding is.
If your console supports UTF-8 you can try to start the JVM with -Dfile.encoding=utf-8
, or you can do this from your application:
System.setOut(new PrintStream(System.out, true, "utf-8"));
If the console does not support UTF-8, I suggest to try to write that to a file instead, using UTF-8 encoding, then open the file with a text editor that can handle UTF-8.
If all of this doesn't work, then it is probably a bug in StringEscapeUtils
.
精彩评论