Python: How to create nested XML from a flat datastructure
I would like to create nested XML (as a string) from a list of dicts with python:
toc = [
{'entryno': 1, 'level': 1, 'pageno': 17, 'title': 'title a'},
{'entryno': 2, 'level': 2, 'pageno': 19, 'ti开发者_JAVA技巧tle': 'title b'},
{'entryno': 3, 'level': 1, 'pageno': 25, 'title': 'title c'},]
level means nesting level and there might be more than 2 levels in my dict. The toc has a fixed ordering (by entryno). Level can only increase by one from one entry to the next but it could decrease by more than one. Here is the nested example XML I want to create:
<entry id="1">
<pageno>17</pageno>
<title>title a</title>
<entry id="2">
<pageno>19</pageno>
<title>title b</title>
</entry>
</entry>
<entry id="3">
<pageno>25</pageno>
<title>title c</title>
</entry>
I tried to solve this with string.Template() and iterating over the toc but i got stuck at creating the nested part of the XML. I suspect the solution will be some recursive stuff.
As a programming beginner I am not just interested in a solution but also in your train of thought to solve this!.
Non-ugly solution using the ElementTree API. One implementation is included with Python, as xml.etree.[c]ElementTree. Another is lxml.etree, which provides more functionality, including pretty-printing the output.
# import xml.etree.cElementTree as et
import lxml.etree as et
import sys
toc = [
{'entryno': 1, 'level': 1, 'pageno': 17, 'title': 'title a'},
{'entryno': 2, 'level': 2, 'pageno': 19, 'title': 'title b'},
{'entryno': 3, 'level': 1, 'pageno': 25, 'title': 'Smith & Wesson'},
{'entryno': 4, 'level': 2, 'pageno': 27, 'title': '<duct tape>'},
{'entryno': 5, 'level': 2, 'pageno': 29, 'title': u'\u0404'},
]
root = et.Element("root")
tree = et.ElementTree(root)
parent = {0: root}
for entry in toc:
level = entry['level']
entryno = entry['entryno']
# create the element and link it to its parent
elem = et.SubElement(parent[level - 1], "entry", {'id': str(entryno)})
# create children to hold the other data items
for k, v in entry.iteritems():
if k in ('entryno', 'level'): continue
child = et.SubElement(elem, k)
child.text = unicode(v)
# record current element as a possible parent
parent[level] = elem
# tree.write(sys.stdout)
tree.write(sys.stdout, pretty_print=True)
Let's assume that you know how to create XML.
Let's assume that 'level' in your data increases if the data is nested in the previous node, and it only increases by 1. If the level decreases, this means that you are not talking about current node any more, but rather about some node above; level == 1 means 'attach at document level'.
To handle increasing levels, you just need to track the previous node. If the level increases by one, you create a new node and make it a child of previous node.
To handle the same level, you need to remember the parent of the previously created node. You attach the new node to that parent, because it's a peer of the previous node.
To handle decreasing levels, you need to step back from previous node several steps so that you're on the right level. Can you see a pattern?
Really you need to remember the whole chain from document level to the previously created node. If next_node.level == previous_node.level + 1
, you attach it to the end of chain. Else you step back previous_node.level - next_node.level + 1
items up the chain and use that node as the parent. We assume that level 0 is document level.
A bit of code to illustrate this:
def nest(input):
ret = {'level': 0} # 'document level'
path = [ret]
for item in input:
node = dict(item) # a copy of item, lest we alter input
old_level = path[-1]['level'] # last element's
new_level = node['level']
delta = new_level - old_level - 1
if delta < 0:
path = path[:delta]
children_list = path[-1].get('_children', None) or []
children_list.append(node)
path[-1]['_children'] = children_list
path.append(node)
return ret
from pprint import PrettyPrinter
pr = PrettyPrint(indent=2).pprint
pr(nest(toc))
and you see
{ '_children': [ { '_children': [ { 'entryno': 2,
'level': 2,
'pageno': 19,
'title': 'title b'}],
'entryno': 1,
'level': 1,
'pageno': 17,
'title': 'title a'},
{ 'entryno': 3, 'level': 1, 'pageno': 25, 'title': 'title c'}],
'level': 0}
Under _children
we list nested nodes.
toc = [
{'entryno': 1, 'level': 1, 'pageno': 17, 'title': 'title a'},
{'entryno': 2, 'level': 2, 'pageno': 18, 'title': 'title d'},
{'entryno': 3, 'level': 3, 'pageno': 19, 'title': 'title e'},
{'entryno': 4, 'level': 4, 'pageno': 20, 'title': 'title b'},
{'entryno': 5, 'level': 5, 'pageno': 25, 'title': 'title c'},]
blevel=0
ret=""
for i in toc:
while blevel >= i['level']:
ret += "%s</entry>\n" % (" " * blevel)
blevel-=1
blevel=i['level']
ident=" " * i['level']
ret += "%s<entry id=\"%i\">\n" % (ident, i['entryno'])
ident+=" "
for a in i:
if not a in ['entryno','level']:
ret += "%s<%s>%s</%s>\n" % (ident,a,i[a],a)
while blevel > 0:
ret += "%s</entry>\n" % (" " * blevel)
blevel-=1
print ret
精彩评论