How to test for existence of child nodes using Python to iterate over XML (using xml.dom.minidom)
I am using Python, and xml.dom.minidom, to iterate over an exported Excel Spreadsheet, outputting an HTML table for our dining hall menu with various calls to .write. The difficulty lies in that the XML that Excel outputs isn't structured. To compensate for this, I have set up a number of variables (day, previousDay, meal etc.) that get set when I encounter child nodes that have a nodeValue that I am testing against. I have a bunch of if statements to determine when to start a new table (for each day of the week), or a new row (when day != previousDay) and so on.
I am having difficuly in figuring out how to ignore particular nodes though. There 开发者_如何学Care a handful of nodes that get output from Excel that I need to ignore, and I can do this based on their children nodes having particular values, but I can't figure out how to implement it.
Basically, I need the following if statement in my main for loop:
for node in dome.getElementsByTagName('data'):
if node contains childNode with nodeValue == 'test':
do something
My quick inclination is to have a nested for-loop with a get-out-of-node-free-card (um, exception) like the following.
Class BadNodeException (Exception):
pass
for node in dome.getElementsByTagName('data'):
try:
for child in node.childNodes:
if child.nodeValue == 'test':
raise BadNodeException
## process node as normal
except BadNodeException:
pass
Do you have to use xml.dom.minidom
? Because this is the kind of thing that XPath shines at. Using lxml.etree
, for instance, this finds all of the elements you want:
my_elements = document.xpath("//data[not(*[.='test'])]")
The W3C's DOM is really hard to use for real-world problems, because it doesn't include simple things like an attribute returning an element's value. (XPath declares that an element's value is all of its child text nodes concatenated together, which is why the above pattern works.)
You'll need to implement a helper function for that sort of thing, e.g.:
def element_text(e):
return "".join(t.nodeValue for t in e.childNodes if t.nodeType == Node.TEXT_NODE)
This makes it easier to build a filter function, e.g.:
def element_is_of_interest(e):
return not any((c for c in e.childNodes if element_text(c) == "test"))
and get your elements like this:
my_elements = filter(element_is_of_interest, d.getElementsByTagName("data"))
Have you considered using a SAX parser instead? Sax parsers process the XML tree structure in the order of appearance of the nodes (depth first) and allows you to handle the node value at the point of parsing it.
xml.sax.XmlReader
精彩评论