How to test for existence of child nodes using Python to iterate over XML (using xml.dom.minidom)

2023-02-23 08:22 问答作者：

I am using Python, and xml.dom.minidom, to iterate over an exported Excel Spreadsheet, outputting an HTML table for our dining hall menu with various calls to .write. The difficulty lies in that the XML that Excel outputs isn't structured. To compensate for this, I have set up a number of variables (day, previousDay, meal etc.) that get set when I encounter child nodes that have a nodeValue that I am testing against. I have a bunch of if statements to determine when to start a new table (for each day of the week), or a new row (when day != previousDay) and so on.

I am having difficuly in figuring out how to ignore particular nodes though. There 开发者_如何学Care a handful of nodes that get output from Excel that I need to ignore, and I can do this based on their children nodes having particular values, but I can't figure out how to implement it.

Basically, I need the following if statement in my main for loop:

for node in dome.getElementsByTagName('data'):  
    if node contains childNode with nodeValue == 'test':
        do something

My quick inclination is to have a nested for-loop with a get-out-of-node-free-card (um, exception) like the following.

Class BadNodeException (Exception):
pass
for node in dome.getElementsByTagName('data'):
try:  
    for child in node.childNodes:
        if child.nodeValue == 'test':
           raise BadNodeException
    ## process node as normal
except BadNodeException:
    pass

Do you have to use xml.dom.minidom? Because this is the kind of thing that XPath shines at. Using lxml.etree, for instance, this finds all of the elements you want:

my_elements = document.xpath("//data[not(*[.='test'])]")

The W3C's DOM is really hard to use for real-world problems, because it doesn't include simple things like an attribute returning an element's value. (XPath declares that an element's value is all of its child text nodes concatenated together, which is why the above pattern works.)

You'll need to implement a helper function for that sort of thing, e.g.:

def element_text(e):
  return "".join(t.nodeValue for t in e.childNodes if t.nodeType == Node.TEXT_NODE)

This makes it easier to build a filter function, e.g.:

def element_is_of_interest(e):
   return not any((c for c in e.childNodes if element_text(c) == "test"))

and get your elements like this:

my_elements = filter(element_is_of_interest, d.getElementsByTagName("data"))

Have you considered using a SAX parser instead? Sax parsers process the XML tree structure in the order of appearance of the nodes (depth first) and allows you to handle the node value at the point of parsing it.

xml.sax.XmlReader

继续阅读：minidom python xml

How to test for existence of child nodes using Python to iterate over XML (using xml.dom.minidom)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？