开发者

when i parse a xml document by jdom, i found that some '\r' characters were lost

When i parse a xml document by jdom, i found that some '\r' characters were lost in 开发者_如何学编程Document object.

for instance: aa\r\nbb

After parsing it, i find that the property 'text'of Element 'b' is 'aa\nbb'.

Does anybody knows why the original '\r' is lost. Any suggestion appreciated.

Thanks.


The xml spec requires that line endings are normalized to \n by the parser, see the section on line endings


As @superfell points out, the XML specification requires an XML parser to normalize line endings to '\n' characters.

What can you do about it? Not a lot!

  1. You could use a character entity whose value is or contains a carriage return character. My reading of the normalization rules is that this will turn into a carriage return character in the normalized XML. However, this means you will have to change your input XML.

  2. You could change the application to replace the newlines with the appropriate platform-specific line endings ... after extracting them from the DOM.

  3. (You could even change the XML to represent the text in an encoded form; e.g. hexadecimal or base64. However, that's extremely ugly, and defeats the purpose of using XML.)

Of these, option 2 seems the least unattractive ...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜