StringEscapeUtils.unescapeHtml() problem in unescaping HTML entities for Android
This is what I'm doing:
public static String htmlToText(String inString)
{
String noentity=StringEscapeUtils.unescapeHtml(inString);
return noentity;
}
This is where I'm invoking it:
String html = "<html><body>string 1<br />—<p>string 2</p></body></html>";
String nohtml = Utility.htmlToText(html);
Log.i("NON HTML STRING:",nohtml);
And this is the output in the log:
10-13 12:38:12.121: INFO/NON HTML STRING:(300): <html><body>string 1<br />â<p>string 2</p></body></html>
According to the reference at http://www.w3.org/TR/html4/sgml/entities.html —
should be replaced by a "—" (which is the output I expect) and not a "â" (which is not what I want).
At first I was using JSoup and the same thing was happening. Thinking it to be a bug, I switched to o开发者_开发知识库rg.apache.commons.lang and the same thing is happening.
Anyone else know what's going on here? Am I missing something obvious?
Resolved.....
It was a problem with the output in Logcat.
Putting a breakpoint showed me the actual output which was correct.
This is the second time the Logcat tool has thrown me off course....
精彩评论