开发者

XML Cross Reference

I have an XML file that contains an ID and another XML file that contains the same ID. I would like to cross reference these files and extract information from the second file. The first file contains only those ID's that I need. For example the first file contains the ID's 345, 350, 353, 356 and the second file contains the ID's 345,346,347,348,349,350 .... I want to extract the data node and all of its children from the second file.

The first file structure:

<data>
    <node>
        <info>info</info>
        <id>345</id>
    </node>
    <node2>
        <node3>
                <info2>info</info2>
                <id>2</id>
        </node3>
        <otherinfo>1</otherinfo>
        <text type = "02">
                <role>info</role>
                <st>1</st>
        </text>
    </node2>
</data>

The second file structure:

<data>
    <node>
        <info>info</info>
        <id>345</id>
    </node>
    <node2>And a bunch of other nodes</node2>
    <node2>And a bunch of other nodes</node2>
    <node2>And a bunch of other nodes</node2>
</data>

I have tried a ruby/nokogiri solution but I can't seem to get very far. I'm open to solutio开发者_高级运维ns in any scripting language.


To extract all id values from the first xml string:

from lxml import etree

e1 = etree.fromstring(xml1)
ids = e1.xpath('//id/text()')

To extract all <node> elements from the second xml string that are parents to id elements with known id values from the first one:

import re

e2 = etree.fromstring(xml2)
ns_re = dict(re="http://exslt.org/regular-expressions")
re_id = "|".join(map(re.escape, ids))
nodes = e2.xpath("//id[re:test(.,'^(?:%s)$')]/parent::node" % re_id,
                 namespaces=ns_re)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜