开发者

python xml query get parent

I have a big xml document that looks like this:

<Node name="foo">
    <Node name="16764764625">
        <Val name="type"><s>3</s></Val>
        <Val name="owner"><s>1</s></Val>
        <Val name="location"><s>4</s></Val>
        <Val name="brb"><n/></Val>
        <Val name="number"><f>24856</f></Val>
        <Val name="number2"><f>97000.0</f></Val>
    </Node>
    <Node name="1764466544">
        <Val name="type"><s>1</s></Val>
        <Val name="owner"><s>2</s></Val>
     开发者_如何转开发   <Val name="location"><s>6</s></Val>
        <Val name="brb"><n/></Val>
        <Val name="number"><f>265456</f></Val>
        <Val name="number2"><f>99000.0</f></Val>
    </Node>
    ...
</Node>

My mission is to get the value of the parent node: 1764466544 (value of name in 2nd Node) by doing a search to find if the subelement of the node Val name="number" contains 265456

I've been doing a heap of reading on XPath, and ElementTree, but I am still not sure where to start actually query this. Looking for examples... I can't find any that reference a parent node as a result.

Still new to python.. any suggestions would be appreciated.

Thanks


Unfortunately, when using the ElementTree API, each Element object has no reference back to its parent, so you cannot go up the tree from a known point. Instead, you have to find the possible parent objects and filter the ones you want.

This is commonly done with XPath expressions. However, ElementTree only supports a subset of XPath (see the docs), the most useful parts of which were only added in ElementTree 1.3, which only comes with Python 2.7+ or 3.2+.

And even, ElementTree's XPath it cannot work with your file as is - there is no way to select based on the text of a node, only its attributes (or attribute values).

My experimentation has only found two ways you can proceed with ElementTree. If you are using Python 2.7+ (or are able to download and install a newer version of ElementTree to work with older Python versions), and you can modify the format of the XML file to put the numbers as attributes, like so

<Val name="number"><f val="265456" /></Val>

then the following Python code will pull out the nodes of interest:

import xml.etree.ElementTree as ETree
tree = ETree.ElementTree(file='sample.xml')
nodes = tree.findall(".//Node/Val[@name='number']/f[@val='265456']....")

For older Pythons, or if you cannot modify the XML format, you will have to filter the invalid nodes manually. The following worked for me:

import xml.etree.ElementTree as ETree
tree = ETree.ElementTree(file='sample.xml')
all = tree.findall(".//Node")
nodes = []

# Filter matching nodes and put them in the nodes variable.
for node in all:
    for val in node.getchildren():
        if val.attrib['name'] == 'number' and val.getchildren()[0].text =='265456':
            nodes.append(node)

Neither of these solutions is what I would call ideal, but they're the only ones I have been able to make work with the ElementTree library (since that is what you mentioned using). You might be better off using a third-party library rather than using the built-in ones; see the Python wiki entry on XML for a list of options. lxml is the Python bindings for the widely-used libxml2 library, and would be the one I would suggest looking at first. It has XPath support so you should be able to use the queries from the other answers.


This XPath:

/Node/Node[Val[@name='number']/f='265456']/@name

Outputs:

1764466544


The following function has helped me in similar cases. As the docstring explains, it doesn't work in the general case, but if your nodes are unique, it should help.

def get_element_ancestry(root, element):
'''Return a list of ancestor Elements for the given element.

If both root and element are of type xml.etree.ElementTree.Element, and if
the given root contains the given element as a descendent, then return a
list of direct xml.etree.ElementTree.Element ancestors, starting with root
and ending with element. Otherwise, return an empty list.

The xml.etree.ElementTree module offers no function to return the parent of
a given Element, presumably because an Element may be in more than one tree,
or even multiple times within a given tree, so its parent depends on the
context. This function provides a solution in the specific cases where the
caller either knows that the given element appears just once within the
tree or is satisfied with the first branch to reference the given element.
'''
result = []
xet = xml.etree.ElementTree
if not xet.iselement(root) or not xet.iselement(element):
    return result
xpath = './/' + element.tag \
    + ''.join(["[@%s='%s']" % a for a in element.items()])
parent = root
while parent != None:
    result.append(parent)
    for child in parent.findall('*'):
        if child == element:
            result.append(element)
            return result
        if child.findall(xpath).count(element):
            parent = child
            break
    else:
        return []
return result


Usually

node.parentNode 

will return a potiner to the parent node (when using a DOM parser).

For XPath see

http://www.tizag.com/xmlTutorial/xpathparent.php

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜