开发者

StringEscapeUtils.unescapeHtml() problem in unescaping HTML entities for Android

This is what I'm doing:

public static String htmlToText(String inString)
{
String noentity=StringEscapeUtils.unescapeHtml(inString);
return noentity;
}

This is where I'm invoking it:

String html = "<html><body>string 1<br />&#8212;<p>string 2</p></body></html>";
    String nohtml = Utility.htmlToText(html);
    Log.i("NON HTML STRING:",nohtml);

And this is the output in the log:

10-13 12:38:12.121: INFO/NON HTML STRING:(300): <html><body>string 1<br />â<p>string 2</p></body></html>

According to the reference at http://www.w3.org/TR/html4/sgml/entities.html &#8212; should be replaced by a "—" (which is the output I expect) and not a "â" (which is not what I want).

At first I was using JSoup and the same thing was happening. Thinking it to be a bug, I switched to o开发者_开发知识库rg.apache.commons.lang and the same thing is happening.

Anyone else know what's going on here? Am I missing something obvious?


Resolved.....

It was a problem with the output in Logcat.

Putting a breakpoint showed me the actual output which was correct.

This is the second time the Logcat tool has thrown me off course....

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜