Special characters in xml encoding using dom and java?
I have some code to transform an Excel file to an XML one but when the cell's text contains some special characters, I'm unable to handle then correctly. For example: a cell contains texts like
(Destinataire de flux entrants ou Origine de flux so开发者_JAVA技巧rtants) **==>** trallla
when tranforming it into xml, I get
(Destinataire de flux entrants ou Origine de flux sortants) **==>** trallla
How can I get around of this problem?
You do not want '>' to be part of a value in a xml tag as it's a character that denotes the end of a tag. If it's substituted to > automatically than be happy it is. Your XML would become unusable otherwise. Typically any parsing of the XML afterwards will know how to handle the > part and re-substitute it.
You can also use CDATA. If this can help you solve your problem.
If you have problems reading esacaped HTML characters you can use Apache commons lang library which includes the method StringEscapeUtils.html.unescapeHtml(..).
The unescaped String is the input you want.
精彩评论