How To HTML Escape Curly Quotes in a Java String

2022-12-11 04:29 问答作者：

I've got a string that has curly quotes in it. I'd like to replace those with HTML entities to make sure they don't confuse other downstream systems. For my first attempt, I just added matching for the characters I wanted to replace, entering them directly in my code:

public static String escapeXml(String s) {
    StringBuilder sb = new StringBuilder();
    char characters[] = s.toCharArray();
    for ( int i = 0; i < characters.length; i++ ) {
        char c = characters[i];
        switch (c) {
            // other escape characters deleted for clarity
            case '“':
                sb.append("&#8220;");
                break;
            case '”':
                sb.appen开发者_StackOverflow社区d("&#8221;");
                break;
            case '‘':
                sb.append("&#8216;");
                break;
            case '’':
                sb.append("&#8217;");
                break;
            default:
                sb.append(c);
                break;
        }
    }
    return sb.toString();
}

This compiled and worked fine on my Mac, but when our CI server (which runs on Linux) tried to build it, it choked:

Out.java:[347,16] duplicate case label

Apparently some part of the build chain on the Linux box can't recognize and distinguish among these fancy characters.

My next attempt was to use Unicode escaping. Unfortunately, this won't even compile on my Mac:

...
            case '\u8220':
                sb.append("&#8220;");
                break;
            case '/u8221':
                sb.append("&#8221;");
                break;
...

My compiler throws this complaint:

Out.java:[346,21] unclosed character literal

I'm baffled as to how one might do this bit of substitution and have it work reliably across platforms. Does anybody have any pointers? Thanks in advance.

You can use the literal character (i.e., '‘'), but your build process needs to specify the correct source encoding during compilation. The javac command option is -encoding. (The attribute on Ant's javac task is the same.) This should match whatever encoding used by your IDE when saving the files.

If your IDE is using UTF-8, for example, but the build machine is using its platform default encoding of US-ASCII, the special characters will be decoded as ?. Since multiple cases now have the same label, you get the original error message.

Unicode literals are in hexadecimal:

case '\u201c':
    sb.append("&#8220;");
    break;
....

And, as mentioned in the other answers, you've got a / instead of a \ in one of your literals.

The compiler problem is because you've got '/u8221' instead of '\u8221' - a forward slash instead of a backslash.

I'm not entirely convinced that using the entities will help, but you can try... I suppose it depends on how broken the downstream code is.

EDIT: Doh, I hadn't spotted that your Unicode values were in decimal. Yes, they need to be in hex :) I'll leave this answer here as it explains why the compiler was complaining - '\u8221' is a perfectly character escape sequence, just not the one you wanted :)

The default encoding varies from platform to platform - Windows uses its own ISO-Latin-1 dialect (at least those I've worked on). Linux frequently use UTF-8 (which is most likely your problem) and Mac uses MacRoman. You can circumvent most of your problems by keeping to plain 7-bit ASCII, and using \u for anything above that if you need it in your source code.

Personally I would keep anything "national" outside the Java source, and use the Localization features to look up translated strings for simple keys and they are placed in your Java code.

A better approach would be to use Apache Commons Lang http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html.

继续阅读：html-entities unicode

How To HTML Escape Curly Quotes in a Java String

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？