Find element with attribute with minidom
Given
<field name="frame.time_delta_displayed" showname="Time delta from previous displayed frame: 0.000008000 seconds" size="0" pos="0" show="0.000008000"/>
<field name="frame.time_relative" showname="Time since reference or first frame: 0.000008000 seconds" size="0" pos="0" show="0.000008000"/>
<field name="frame.number" showname="Frame Number: 2" size="0" pos="0" show="2"/>
<field name="frame.pkt_len" showname="Packet Length: 1506 bytes" hide="yes" size="0" pos="0" show="1506"/>
<field name="frame.len" showname="Frame Length: 1506 bytes" size="0" pos="0" show="1506"/>
<field name="frame.cap_len" showname="Capture Length: 1506 bytes" size="0" pos="0" show="1506"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.protocols" showname="Protocols in frame: eth:ip:tcp:http:data" size="0" pos="0" show="eth:ip:tcp:http:data"/>
How do I get the field with name="frame.len" right away without it开发者_高级运维erating through every tag and checking the attributes?
I don't think you can.
From the parent element
, you need to
for subelement in element.GetElementsByTagName("field"):
if subelement.hasAttribute("frame.len"):
do_something()
Reacting to your comment from March 11, if the structure of your documents is stable and free of nasty surprises (like angle brackets inside attributes), you might want to try the unthinkable and use a regular expression. This is not recommended practice but could work and be much easier than actually parsing the file. I admit that I've done that sometimes myself. Haven't gone blind yet.
So in your case you could (assuming that a <field>
tag doesn't span multiple lines):
xmlfile = open("myfile.xml")
for line in xmlfile:
match = re.search(r'<field\s+name="frame.len"\s+([^>]+)/>', line):
if match:
result = match.group(1)
do_something(result)
If a <field>
tag can span multiple lines, you could try loading the entire file as plain text into memory and then scan it for matches:
filedump = open("myfile.xml").read()
for match in re.finditer(r'<field\s+name="frame.len"\s+([^>]+)/>', filedump):
result = match.group(1)
do_something(result)
In both cases, result
will contain the attributes other than frame.len
. The regex assumes that frame.len
is always the first attribute inside the tag.
You don't -- the DOM API, somewhat poorly designed (by w3c, not by Python!-) doesn't have such a search function to do the iteration for you. Either accept the need to loop (not through every tag in general, but through all with a given tag name), or upgrade to a richer interface, such as BeautifulSoup
or lxml
.
Wow, that regex is horrible! As of 2016, there is a .getAttribute()
method for each DOMElement
that makes things a bit easier, but you still have to iterate through the elements.
l = []
for e in elements:
if e.hasAttribute('name') and e.getAttribute('name') == 'field.len':
l.append(e)
精彩评论