开发者

Bug in Apache Commons StringEscapeUtil?

Just started using Apache Commons String开发者_运维问答EscapeUtils.

According to http://www.w3schools.com/tags/ref_entities.asp, Ö should correspond to Ö. However,

System.out.println(StringEscapeUtils.unescapeHtml4("Ö"));

prints

×

Is this a bug? Or what am I missing?


I guess EntityArrays.java from the lang3 repository is buggy:

{"\u00D6", "Õ"}, // � - uppercase O, tilde
{"\u00D7", "Ö"}, // � - uppercase O, umlaut
{"\u00D8", "×"}, // multiplication sign 

It seems, that some values are shifted by one row. It must be:

 {"\u00D6", "Ö"}, // � - uppercase O, umlaut
 {"\u00D7", "×"}, // multiplication sign 

because Ö is 00D6 according to LATIN CAPITAL LETTER O WITH DIAERESIS

and x is "\u00D7"


version 2.5 StringEscapeUtils.unescapeHtml prints Ö

version 3.0-beta StringEscapeUtils.unescapeHtml3 and StringEscapeUtils.unescapeHtml4 print ×

Generally I'd use the latest stable version (currently 2.5). Looks like a bug but I couldn't find anything useful in https://issues.apache.org/jira/browse/LANG


Perhaps your console cannot show the Ö character. Check the system property file.encoding to see what the default console encoding is.

If your console supports UTF-8 you can try to start the JVM with -Dfile.encoding=utf-8, or you can do this from your application:

System.setOut(new PrintStream(System.out, true, "utf-8"));

If the console does not support UTF-8, I suggest to try to write that to a file instead, using UTF-8 encoding, then open the file with a text editor that can handle UTF-8.

If all of this doesn't work, then it is probably a bug in StringEscapeUtils.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜