Java Split XML file
How can i split an long XML-file into pieces with each a predefined different name?
Example this is my XML file pasted in one long XML, generated for testing. Now i have to split on envelope, each a new file.
<envelope>
<tag1>1</tag1>
<tag2>2</tag2>
<tag3>3</tag3>
</envelope>
<envelope>
<tag1>1</tag1>
<tag2>2</tag2>
<tag3>3</tag3>
</envelope>
<envelope>
<tag1>1</tag1>
<tag2>2</tag2>
&开发者_JAVA技巧lt;tag3>3</tag3>
</envelope>
I have already work with splits before just not like this where there is no begin and end tag for the entire xml.
I suggest making it well formed and then using one of the SAX or StAX solutions as suggested. The only difference is that I would avoid loading the whole thing into memory and instead inject the start and end elements by way of a SequenceInputStream.
for example:
InputStream in = new SequenceInputStream(
// start doc
new ByteArrayInputStream("<root>".getBytes()),
new SequenceInputStream(
new FileInputStream("envelopes.txt"),
// end doc
new ByteArrayInputStream("</root>".getBytes())));
As Joachim said this is not an XML.
You can try to add a root element programmaticly, save the file as a temp somewhere and then refer to the other similar question on how to split it.
Answering the comment:
This might help you load it. I doubt you should worry about the size, since to split it you'd have to load it in memory anyway and then write it again.
Then something like:
final String xmlWithRootElement = "<root>" + IOUtils.toString(yourFile) + "</root>";
should do it. (without so many hardcoded strings)
One last thing.
I would suggest finding a solution that works. Then if you're unhappy with the performance you can look for ways to optimize it or you can ask a performance related question.
How about just read the file character by character and identify <envelope>
and </envelope>
sequences. Whenever you encounter <envelope>
you start capturing to a buffer until you reach </envelope>
. This way the file can be as big as the filesystem allows. XML manipulation on large files is a headache :-)
精彩评论