How to unescape non-standard characters in XML in Java?
I realize a similar question has been asked before, and the solution is to use StringEscapeUtils.unescape(). However, per the method description:
Supports only the five basic XML entities (gt, lt, quot, amp, apos). Does not support DTDs or external entities.
I have a bunch of XML files with escaped characters like ␣
and &hyph;
. How can I unescape these? They are defined in the DTD provided. Is there a meth开发者_运维百科od like StringEscapeUtils but one with DTD support?
Hmm, it's been a long time, but I think an implementation of EntityResolver2
(Java SDK) handles externally defined entities. This is part of the SAX2 specification.
精彩评论