开发者

XML parsing using perl

I tried to research on simple question I have but couldn't do it. I am trying to get data from web which is in XML and parse it using perl. Now, I know how to loop on repeating elements. But, I am stuck when its not repeating (I know this might be silly). If the elements are repeating, I put it in array and get th开发者_如何学Pythone data. But, when there is only a single element it throws and error saying 'Not an array reference'. I want my code such that it can parse at both time (for single and multiple elements). The code I am using is as follows:

use LWP::Simple;
use XML::Simple;
use Data::Dumper;

open (FH, ">:utf8","xmlparsed1.txt");

my $db1 = "pubmed";
my $query  = "13054692";
my $q = 16354118;          #for multiple MeSH terms
my $xml = new XML::Simple;

$urlxml = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=$db1&id=$query&retmode=xml&rettype=abstract";
$dataxml = get($urlxml);
$data = $xml->XMLin("$dataxml");
#print FH Dumper($data);
foreach $e(@{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}})
     {
       print FH $e->{DescriptorName}{content}, ' $$ ';
     } 

Also, can I do something such that the separator $$ will not get printed after the last element? I also tried the following code:

$mesh = $data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading};
while (my ($key, $value) = each(%$mesh)){
    print FH "$value";
}

But, this prints all the childnodes and I just want the content node.


Perl's XML::Simple will take a single item and return it as a scalar, and if the value repeats it sends it back as an array reference. So, to make your code work, you just have to force MeshHeading to always return an array reference:

$data = $xml->XMLin("$dataxml", ForceArray => [qw( MeshHeading )]);


I think you missed the part of "perldoc XML::Simple" that talks about the ForceArray option:

check out ForceArray because you'll almost certainly want to turn it on

Then you will always get an array, even if the array contains only one element.


As others have pointed out, the ForceArray option will solve this particular problem. However you'll undoubtedly strike another problem soon after due to XML::Simple's assumptions not matching yours. As the author of XML::Simple, I strongly recommend you read Stepping up from XML::Simple to XML::LibXML - if nothing else it will teach you more about XML::Simple.


Since $data->{PubmedArticle}-> ... ->{MeshHeading} can be either a string or an array reference depending on how many <MeshHeading> tags are present in the document, you need to examine the value's type with ref and conditionally dereference it. Since I am unaware of any terse Perl idioms for doing this, your best bet is to write a function:

sub toArray {
 my $meshes = shift;
 if (!defined $meshes) { return () }
 elsif (ref $meshes eq 'ARRAY') { return @$meshes }
 else { return ($meshes) }
}

and then use it like so:

foreach my $e (toArray($data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading})) { ... }

To prevent ' $$ ' from being printed after the last element, instead of looping over the list, concatenate all the elements together with join:

print FH join ' $$ ', map { $_->{DescriptionName}{content} }
 toArray($data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading});


This is a place where XML::Simple is being...simple. It deduces whether there's an array or not by whether something occurs more than once. Read the doc and look for the ForceArray option to address this.

To only include the ' $$ ' between elements, replace your loop with

print FH join ' $$ ', map $_->{DescriptorName}{content}, @{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}};
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜