How to build an xml tree using an event based parsers in perl for a huge data?
I have an XML file like this:
<Nodes><Node>
<NodeName>Company</NodeName>
<File>employee_details.csv</File>
<data>employee_data.txt</data>
<Node>
<NodeName>dummy</NodeName>
<File>employee_details1.csv</File>
<data>employee_data1.txt</data>
</Node>
</Node>
</Nodes>
#Contents of employee_data.txt
Empname,Empcode,EmpSal:Currency,Empaddr
#Contents of employee_details.csv (like this huge data)
Alex,A001,1000:USD,Bangalore
Aparna,B001,1000:RUBEL,Bombay
#Contents of employee_data1.txt
phone,fax
#Contents of employee_details1.csv (like this huge data)
44568889,123345656
23232323,454545757
Output:
<Company>
<Empname>Alex</Empname>
<Empcode>A001</Empcode>
<EmpSal=USD>1000</EmpSal>
<Empaddr&g开发者_Go百科t;Bangalore</Empaddr>
<phone>44568889</phone>
<fax>123345656</fax>
</Company>
<Company>
<Empname>Aparna</Empname>
<Empcode>B001</Empcode>
<EmpSal=RUBEL>1000</EmpSal>
<Empaddr>Bombay</Empaddr>
<phone>23232323</phone>
<fax>454545757</fax>
I want to build an XML tree with Sax parser but I am not able to understand how to traverse across all the nodes and create an event.
I should get the above output?
How can I do it in Perl?
.pl file my $factory = XML::SAX::ParserFactory->new(); my $parser = $factory->parser( Handler =>sax_handler->new(arguments_to parse));
sax_handler.pm su new() { //nothing as such ! my ($type); return bless {}, $type; } //follwong 2 methods are important sub start_element { my ($self, $element) = @_;
#attributes of comment tag...m:text is tag
if( $element->{Name} eq "m:text")
{
$name=$element->{Attributes}->{'{}name'}->{'Value'};
}
}
//m:reviewID is tag in u r xml ! sub end_element { my ($self, $element) = @_;
#write down all tags...& print them or manipulate them
if( $element->{Name} eq "m:reviewID"){
} }
It looks to me that the CSV files can be huge, not the XML one. So really there is no need to use a SAX parser. The XML is used only to give you the location of 4 files. 2 of those files (the .txt
ones) are small, they only contain a list of fields, and the last 2 files can be big. Those are the CSV file.
You should use Text::CSV_XS to parse those 2 huge file. You can then output the XML using plain print (just make sure you escape the text and pay attention to the encoding (BTW in your sample output <EmpSal=USD>
is not well-formed XML, the attribute value needs to be quoted: <EmpSal="USD">
). An other options is XML::Writer, which will take care of escaping and quoting for you. I don't think generating SAX events and passing them to a SAX writer makes sense in this case, it would be more complex and probably slower than the other options.
Well SAX Parser is slightly different from other parsing techniques. Here you need to write your handler [ perl module]. module must contains following things -> 1. constructor. 2. subroutine start_element 3.end_element. You can manage events inside the subroutines like this [for tag] -->if( $element->{Name} eq "mail_id"){ $user_mail_id=$self->get_text();}
精彩评论