when i parse a xml document by jdom, i found that some '\r' characters were lost
When i parse a xml document by jdom, i found that some '\r' characters were lost in 开发者_如何学编程Document object.
for instance: aa\r\nbb
After parsing it, i find that the property 'text'of Element 'b' is 'aa\nbb'.
Does anybody knows why the original '\r' is lost. Any suggestion appreciated.
Thanks.
The xml spec requires that line endings are normalized to \n by the parser, see the section on line endings
As @superfell points out, the XML specification requires an XML parser to normalize line endings to '\n'
characters.
What can you do about it? Not a lot!
You could use a character entity whose value is or contains a carriage return character. My reading of the normalization rules is that this will turn into a carriage return character in the normalized XML. However, this means you will have to change your input XML.
You could change the application to replace the newlines with the appropriate platform-specific line endings ... after extracting them from the DOM.
(You could even change the XML to represent the text in an encoded form; e.g. hexadecimal or base64. However, that's extremely ugly, and defeats the purpose of using XML.)
Of these, option 2 seems the least unattractive ...
精彩评论