开发者

How do I compress XML from XML::Simple::XMLout?

I am using XML::Simple to parse and edit a very large XML file, and speed is essential (so far of all the method's I have tried XML::Simple has been the fastest)

Now once all my edits are completed I print the XML to a document using XMLout(), though it prints it with proper indentation which is nice if this was read by humans but is completely useless in my situation.

The output file without white space is 1.2 Mb with white space it is 15 Mb.

I have been using:

my $string = XMLout($dat开发者_如何学JAVAa);
$string =~ s/>[\s]*</></g;
print $out $string;

But it seems to not only be an extreme CPU hog and takes an enormous amount of memory to do.

Is their a way to simply output my XML object as proper XML without all the useless white space?

Thanks


Look at NoIndent option: From XML::Simple manpage:

NoIndent => 1 # out - seldom used

Set this option to 1 to disable "XMLout()"’s default ’pretty printing’ mode. With this option enabled, the XML output will all be on one line (unless there are newlines in the data) - this may be easier for downstream processing.

NormaliseSpace => 0 │ 1 │ 2 # in - handy

This option controls how whitespace in text content is handled. Recognised values for the option are:

  • 0 = (default) whitespace is passed through unaltered (except of course for the normalisation of whitespace in attribute values which is mandated by the XML recommendation)

  • 1 = whitespace is normalised in any value used as a hash key (normalising means removing leading and trailing whites- pace and collapsing sequences of whitespace characters to a single space)

  • 2 = whitespace is normalised in all text content

    Note: you can spell this option with a ’z’ if that is more natural for you.


Just set the NoIndent option in the call to XMLout(). Like this:

my $string = XMLout($data, NoIndent=>1);

Tada!


An event-driven XML parser is going to be faster than something that needs to load the whole things into memory at once.

You shouldn't do so much extra work in your pattern! Try this instead:

$string =~ s/>\s+</></g;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜