Sax characters breaking element apart [duplicate]
Possible Duplicate:
JAVA SAX parser split calls to characters()
I have an XML file with the开发者_开发问答 following syntax:
<tag ...>
a bunch of text here
<tag ...>
There aren't any closing tags for tag
. I'm grabbing the text in-between the two tags, and storing them in a List<String>
in characters ()
. It works for the most part, but on some xml files, it reads a line terminator or something, that breaks the text into two; rather than storing a single entry, "a bunch of text here", I get two entries: "a bunch of", and "text here". The difference is that unlike all the other entries, it doesn't store a line break after "a bunch of", or before "text here".
I need to fix this, but don't know how. I'd appreciate your help.
The parser is allowed to call the ContentHandler characters method multiple times for each string of element text, it's not finding a line terminator necessarily. the Java tutorial on SAX has a short explanation of the characters method:
Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found.
Also this Javaworld article has good explanations and examples.
精彩评论