How do I process a nested list?
Suppose I have a bulleted list like this:
* list item 1
* list item 2 (a parent)
** list item 3 (a child of list item 2)
** list item 4 (a child of list item 2 as well)
*** list item 5 (a child of list item 4 and a grand-child of list item 2)
* list item 6
I'd like to parse that into a nested list or some other data structure wh开发者_高级运维ich makes the parent-child relationship between elements explicit (rather than depending on their contents and relative position). For example, here's a list of tuples containing an item and a list of its children (and so forth):
Edit: Hopefully, a more correct list example, where each element in the list is a tuple containing: a bullet's text and, if applicable, a list of children (in the same form).
[('list item 1',), ('list item 2', [('list item 3',), ('list item 4', [('list item 5',)])] ('list item 6',)]
[('list item 1',),
('list item 2', [('list item 3',), ('list item 4', [('list item 5',)])]),
('list item 6',)]
I've attempted to do this with plain Python and some experimentation with Pyparsing, but I'm not making progress. I'm left with two major questions:
- What's the strategy I need to employ to make this work? I know recursion is part of the solution, but I'm having a hard time making the connection between this and, say, a Fibonacci sequence.
- I'm certain I'm not the first person to have done this, but I don't know the terminology of the problem to make fruitful searches for more information on this topic. What problems are related to this so that I can learn more about solving these kinds of problems in general?
In the view of search algorithm, the bullet you give is actually a sequence generated by Depth-First-Search. So my strategy is just to rebuild the tree structure with the dfs-sequence.
Following is the python code:
from collections import deque
def dfsBullet(bullet,depth):
"""
parse the subtree with depth and the startnode of bullet[0]
"""
li = []
if depth != 0:
item = bullet.popleft()
li.append(item.split(' ',1)[1])
while (len(bullet) != 0):
item = bullet[0]
#apply same algo to the child node
if len(item.split(' ',1)[0]) > depth:
sublist = dfsBullet(bullet, len(item.split(' ')[0]))
#we have traverse all childnode, so go back
else:
return li
#add child tree to the list
li.append(sublist)
return li
I can't parse your desired result -- it seems to have more open parentheses than corresponding closed ones and I don't understand the logic behind it.
To make a tree structure explicit, what about, e.g.:
data = '''* list item 1
* list item 2
** list item 3
** list item 4
*** list item 5
* list item 6'''.splitlines()
class Node(object):
def __init__(self, payload):
self.payload = payload
self.children = []
def show(self, indent):
print ' '*indent, self.payload
for c in self.children:
c.show(indent+2)
def makenest(linelist):
rootnode = Node(None)
stack = [(rootnode, 0)]
for line in linelist:
for i, c in enumerate(line):
if c != '*': break
stars, payload = line[:i], line[i:].strip()
curlev = len(stars)
curnod = Node(payload)
while True:
parent, level = stack[-1]
if curlev > level: break
del stack[-1]
# a child node of the current top-of-stack
parent.children.append(curnod)
stack.append((curnod, curlev))
rootnode.show(0)
makenest(data)
The show
method of course exists just for the purpose of verifying that the part about parsing the strings and creating the tree has worked correctly. If you can specify more precisely exactly how it is that you want to transform your tree into nested tuples and lists, I'm sure it will be easy to add to class Node
the appropriate (and probably recursive) method -- so, could you please give this missing specification...?
Edit: since the OP has clarified now, it does, as predicted, become easy to satisfy the spec. Just add to class Node
the following method:
def emit(self):
if self.children:
return (self.payload,
[c.emit() for c in self.children])
else:
return (self.payload,)
and change the last three lines of the code (last one of makenest
, a blank one, and the module-level call to makenest
) to:
return [c.emit() for c in rootnode.children]
print(makenest(data))
(The parentheses after print
are redundant but innocuous in Python 2, required in Python 3, so I put them there just in case;-).
With these tiny changes, my code runs as requested, now emitting
[('list item 1',), ('list item 2', [('list item 3',), ('list item 4', [('list item 5',)])]), ('list item 6',)]
Keep track of the current "depth" you're parsing at.
- If the depth of the next line is more than the current depth, recursively call the parser with the new depth, then add the result from that call to the current list.
- If the depth of the next line is equal to the current depth, add it to the current list.
- If the depth of the next line is less than the current depth, return the current list.
精彩评论