开发者

How do I get a list of all parent tags in BeautifulSoup?

Let's say I have a structure like this:

<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>

If I point to bookmark, what would be the command to just extract all of the folder lines? For example,

bookmarks = soup.findAll('bookmark')
开发者_如何学Python

then beautifulsoupcommand(bookmarks[0]) would return:

[<folder name="folder1">,<folder name="folder2">]

I'd also want to know when the ending tags hit too. Any ideas?

Thanks in advance!


Here is my stab at it:

>>> from BeautifulSoup import BeautifulSoup
>>> html = """<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.find_all('bookmark')
>>> [p.get('name') for p in bookmarks[0].find_all_previous(name = 'folder')]
[u'folder2', u'folder1']

The key difference from @eumiro's answer is that I am using find_all_previous instead of find_parents. When I tested @eumiro's solution I found that find_parents only returns the first (immediate) parent as the name of the parent and grandparent are the same.

>>> [p.get('name') for p in bookmarks[0].find_parents('folder')]
[u'folder2']

>>> [p.get('name') for p in bookmarks[0].find_parents()]
[u'folder2', None]

It does return two generations of parents if the parent and grandparent are differently named.

>>> html = """<folder name="folder1">
     <folder_parent name="folder2">
          <bookmark href="link.html">
     </folder_parent>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.find_all('bookmark')
>>> [p.get('name') for p in bookmarks[0].find_parents()]
[u'folder2', u'folder1', None]


bookmarks[0].findParents('folder') will return you a list of all parent nodes. You can then iterate over them and use their name attribute.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜