Parsing XML to a hash table

2022-12-14 05:45 问答作者：

I have an XML file in the following format:

<doc>
<id name="X">
  <type name="A">
    <min val="100" id="80"/>
    <max val="200" id="90"/>
   </type>
  <type name="B">
    <min val="100" id="20"/>
    <max val="20" id="90"/>
  </type>
</id>

<type...>
</type>
</doc>

I would like to parse this document and build a hash table

{X: {"开发者_C百科A": [(100,80), (200,90)], "B": [(100,20), (20,90)]}, Y: .....}

How would I do this in Python?

I disagree with the suggestion in other answers to use minidom -- that's a so-so Python adaptation of a standard originally conceived for other languages, usable but not a great fit. The recommended approach in modern Python is ElementTree.

The same interface is also implemented, faster, in third party module lxml, but unless you need blazing speed the version included with the Python standard library is fine (and faster than minidom anyway) -- the key point is to program to that interface, then you can always switch to a different implementation of the same interface in the future if you want to, with minimal changes to your own code.

For example, after the needed imports &c, the following code is a minimal implementation of your example (it does not verify that the XML is correct, just extracts the data assuming correctness -- adding various kinds of checks is pretty easy of course):

from xml.etree import ElementTree as et  # or, import any other, faster version of ET

def xml2data(xmlfile):
  tree = et.parse(xmlfile)
  data = {}
  for anid in tree.getroot().getchildren():
    currdict = data[anid.get('name')] = {}
    for atype in anid.getchildren():
      currlist = currdict[atype.get('name')] = []
      for c in atype.getchildren():
        currlist.append((c.get('val'), c.get('id')))
  return data

This produces your desired result given your sample input.

Do not reinvent the wheel. Use Amara toolkit. Variable names are just keys in a dictionary anyway. http://www.xml3k.org/Amara

As others have stated minidom is the way to go here. You open (and parse) the file, while going through the nodes you check if its relevant and should be read. That way, you also know if you want to read the child nodes.

Threw together this, seems to do what you want. Some of the values are read by attribute position rather than attribute name. And theres no error handling. And the print () at the end means its Python 3.x.

I'll leave it as an exercise to improve upon that, just wanted to post a snippet to get you started.

Happy hacking! :)

xml.txt

<doc>
<id name="X">
  <type name="A">
    <min val="100" id="80"/>
    <max val="200" id="90"/>
   </type>
  <type name="B">
    <min val="100" id="20"/>
    <max val="20" id="90"/>
  </type>
</id>
</doc>

parsexml.py

from xml.dom import minidom
data={}
doc=minidom.parse("xml.txt")
for n in doc.childNodes[0].childNodes:
    if n.localName=="id":
        id_name = n.attributes.item(0).nodeValue
        data[id_name] = {}
        for j in n.childNodes:
            if j.localName=="type":
                type_name = j.attributes.item(0).nodeValue
                data[id_name][type_name] = [(),()]
                for k in j.childNodes:
                    if k.localName=="min":
                        data[id_name][type_name][0] = \
                            (k.attributes.item(1).nodeValue, \
                             k.attributes.item(0).nodeValue)
                    if k.localName=="max":
                        data[id_name][type_name][1] = \
                            (k.attributes.item(1).nodeValue, \
                             k.attributes.item(0).nodeValue)
print (data)

Output:

{'X': {'A': [('100', '80'), ('200', '90')], 'B': [('100', '20'), ('20', '90')]}}

I would recommend using the minidom library.

The docs are pretty good so you should be up and running in no time.

Dan.

Another XML parsing library: http://www.crummy.com/software/BeautifulSoup/

Parsing XML documentation starts here: http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20XML

Why not try something like the PyXml library. They have lots of documentation and tutorials.

继续阅读：dom python xml

Parsing XML to a hash table

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？