BeautifulSoup - Help me pick out divs and classes

2023-02-25 21:04 问答作者：

Heres my HMTL code:

<div class="BlockA">
    <h4>BlockA</h4>
    <div class="name">John Smith</div>
    <div class="number">2</div>
    <div class="name">Paul Peterson</div>
    <div class="number">14</div>
</div>

<div class="BlockB">
    <h4>BlockB</h4>
    <div class="name">Steve Jones</div>
    <div class="number">5</div>
</div>

Notice BlockA and BlockB. Both contain the same e开发者_如何学运维lements, ie name and number but are inside seperate classes. I'm new to python and was thinking of trying something like:

parsedHTML = soup.findAll("div", attrs={"name" : "number"})

but that just gives me a blank screen. Is it possible for me to do a findAll from within blockA, display the data, then start another loop from BlockB and do the same?

Thanks.

EDIT: For those asking, I want to simply loop through the values and output in JSON like this:

BlockA
    John Smith
    2
    Paul Peterson
    14

BlockB
    Steve Whoever
    123
    Mr Whathisface
    23

You want to find divs that contain a class attribute of "name" or "number"?

>>> import re
>>> soup.findAll("div", {"class":re.compile("name|number")})

[<div class="name">John Smith</div>, <div class="number">2</div>, <div class="name">Paul Peterson</div>, <div class="number">14</div>, <div class="name">Steve Jones</div>, <div class="number">5</div>]

You need to use a list of possible class values.

soup.findAll('div', {'class': ['name', 'number']})

After seeing your edit:

def grab_content(heading):
    siblings = [s.contents[0] for s in heading.findNextSiblings()]
    return {heading.contents[0]: siblings}

headings = soup.findAll('h4')
[grab_content(h) for h in headings]

And the output for your original HTML snippet would be:

[{u'BlockA': [u'John Smith', u'2', u'Paul Peterson', u'14']},
 {u'BlockB': [u'Steve Jones', u'5']}]

继续阅读：json python

BeautifulSoup - Help me pick out divs and classes

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？