开发者

Extracting Multilevel XML using Perl

I have a XML file as follow:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2010//EN" "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_100101.dtd">
<PubmedArticleSet>
<PubmedArticle>
    <MedlineCitation Owner="NLM" Status="Publisher">
        <PMID>20555148</PMID>
        <DateCreated>
            <Year>2010</Year>
            <Month>6</Month>
            <Day>17</Day>
         </DateCreated>
        <Article PubModel="Print-Electronic">
        <Journal>
            <ISSN IssnType="Electronic">1875-8908</ISSN>
            <JournalIssue CitedMedium="Internet">
                <PubDate>
                    <Year>2010</Year>
                    <Month>Jun</Month>
                    <Day>16</Day>
                </PubDate>
            </JournalIssue>
            <Title>Journal of Alzheimer's disease : JAD</Title>
        </Journal>
        <ArticleTitle>CSF Neurofilament Proteins Levels are Elevated in Sporadic Creutzfeldt-Jakob Disease.</ArticleTitle>
        <Pagination>
            <MedlinePgn/>
        </Pagination>
        <Abstract>
            <AbstractText>In this study we investigated the cerebrospinal fluid (CSF) levels of neurofilament light (NFL) and heavy chain (NFHp35), total tau (t-tau), and glial fibrillary acidic protein (GFAP) to detect disease specific profiles in sporadic Creutzfeldt Jakob disease (sCJD) patients and Alzheimer's disease (AD) patients. CSF levels of NFL, NFHp35, t-tau, and GFAP of 23 sCJD patients and 55 AD patients were analyzed and compared to non-demented controls. Median NFL, NFHp35, GFAP, and t-tau levels were significantly increased in sCJD patients and AD patients versus controls (p &lt; 0.0001 in all). NFL, NFHp35, and t-tau levels were significantly increased in sCJD patients versus AD patients (p &lt; 0.005), but GFAP concentrations did not differ between sCJD and AD. The results suggest that neuroaxonal damage, reflected by higher CSF levels of NFL, NFHp35, and t-tau, is more pronounced in the pathophysiology of sCJD than in AD. The comparable CSF GFAP concentrations suggest that astroglial damage or astrocytosis is equally pronounced in the pathophysiology of AD and sCJD. Prospective studies are needed to determine whether NFL and NFHp35 may be additional tools in the differential diagnosis of rapidly progressive dementias.</AbstractText>
        </Abstract>
        <Affiliation>Department of Neurology, Radboud University Nijmegen Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Alzheimer Centre Nijmegen, The Netherlands.</Affiliation>
        <AuthorList>
            <Author>
                <LastName>van Eijk</LastName>
                <ForeName>Jeroen J J</ForeName>
                <Initials>JJ</Initials>
            </Author>
            <Author>
                <LastName>van Everbroeck</LastName>
                <ForeName>Bart</ForeName>
                <Initials>B</Initials>
            </Author>
            <Author>
                <LastName>Abdo</LastName>
                <ForeName>W Farid</ForeName>
                <Initials>WF</Initials>
            </Author>
            <Author>
                <LastName>Kremer</LastName>
                <ForeName>Berry P H</ForeName>
                <Initials>BP</Initials>
            </Author>
            <Author>
                <LastName>Verbeek</LastName>
                <ForeName>Marcel M</ForeName>
                <Initials>MM</Initials>
            </Author>
        </AuthorList>
        <Language>ENG</Language>
        <PublicationTypeList>
            <PublicationType>JOURNAL ARTICLE</PublicationType>
        </PublicationTypeList>
        <ArticleDate DateType="Electronic">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>16</Day>
        </ArticleDate>
    </Article>
    <MedlineJournalInfo>
        <MedlineTA>J Alzheimers Dis</MedlineTA>
        <NlmUniqueID>9814863</NlmUniqueID>
        <ISSNLinking>1387-2877</ISSNLinking>
    </MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
    <History>
        <PubMedPubDate PubStatus="entrez">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="pubmed">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="medline">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
    </History>
    <PublicationStatus>aheadofprint</Pub开发者_运维问答licationStatus>
    <ArticleIdList>
        <ArticleId IdType="pii">720R60380216K661</ArticleId>
        <ArticleId IdType="doi">10.3233/JAD-2010-090649</ArticleId>
        <ArticleId IdType="pubmed">20555148</ArticleId>
    </ArticleIdList>
</PubmedData>

How do I extract the AbstractText using Perl? Thx.


Here is a quick and dirty example using XML::Twig.

use 5.012;
use warnings;
use XML::Twig;

XML::Twig->new(
    twig_handlers => {
        AbstractText => sub { say $_->text },
    },
)->parsefile( 'your_data.xml' );


Use an XML parser library. For small stuff, you can use XML::Simple. For very big files, XML::Twig or XML::Parser

Example using XML::Simple

use XML::Simple; 
my $xml = XMLin("~/junk/a.xml"); 
my $AbstractText = $xml->{PubmedArticle}->{MedlineCitation}->{Article}->{Abstract}->{AbstractText};
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜