开发者

Java Split XML file

How can i split an long XML-file into pieces with each a predefined different name?

Example this is my XML file pasted in one long XML, generated for testing. Now i have to split on envelope, each a new file.

<envelope>
 <tag1>1</tag1>
 <tag2>2</tag2>
 <tag3>3</tag3>
</envelope>
<envelope>
 <tag1>1</tag1>
 <tag2>2</tag2>
 <tag3>3</tag3>
</envelope>
<envelope>
 <tag1>1</tag1>
 <tag2>2</tag2>
 &开发者_JAVA技巧lt;tag3>3</tag3>
</envelope>

I have already work with splits before just not like this where there is no begin and end tag for the entire xml.


I suggest making it well formed and then using one of the SAX or StAX solutions as suggested. The only difference is that I would avoid loading the whole thing into memory and instead inject the start and end elements by way of a SequenceInputStream.

for example:

InputStream in = new SequenceInputStream(
                        // start doc
                        new ByteArrayInputStream("<root>".getBytes()),
                        new SequenceInputStream(
                           new FileInputStream("envelopes.txt"),
                           // end doc
                           new ByteArrayInputStream("</root>".getBytes())));


As Joachim said this is not an XML.

You can try to add a root element programmaticly, save the file as a temp somewhere and then refer to the other similar question on how to split it.


Answering the comment:

This might help you load it. I doubt you should worry about the size, since to split it you'd have to load it in memory anyway and then write it again.

Then something like:

final String xmlWithRootElement = "<root>" + IOUtils.toString(yourFile) + "</root>";

should do it. (without so many hardcoded strings)

One last thing.

I would suggest finding a solution that works. Then if you're unhappy with the performance you can look for ways to optimize it or you can ask a performance related question.


How about just read the file character by character and identify <envelope> and </envelope> sequences. Whenever you encounter <envelope> you start capturing to a buffer until you reach </envelope>. This way the file can be as big as the filesystem allows. XML manipulation on large files is a headache :-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜