BeautifulSoup - Help me pick out divs and classes
Heres my HMTL code:
<div class="BlockA">
<h4>BlockA</h4>
<div class="name">John Smith</div>
<div class="number">2</div>
<div class="name">Paul Peterson</div>
<div class="number">14</div>
</div>
<div class="BlockB">
<h4>BlockB</h4>
<div class="name">Steve Jones</div>
<div class="number">5</div>
</div>
Notice BlockA
and BlockB
. Both contain the same e开发者_如何学运维lements, ie name
and number
but are inside seperate classes. I'm new to python and was thinking of trying something like:
parsedHTML = soup.findAll("div", attrs={"name" : "number"})
but that just gives me a blank screen. Is it possible for me to do a findAll
from within blockA
, display the data, then start another loop from BlockB
and do the same?
Thanks.
EDIT: For those asking, I want to simply loop through the values and output in JSON like this:
BlockA
John Smith
2
Paul Peterson
14
BlockB
Steve Whoever
123
Mr Whathisface
23
You want to find divs that contain a class attribute of "name" or "number"?
>>> import re
>>> soup.findAll("div", {"class":re.compile("name|number")})
[<div class="name">John Smith</div>, <div class="number">2</div>, <div class="name">Paul Peterson</div>, <div class="number">14</div>, <div class="name">Steve Jones</div>, <div class="number">5</div>]
You need to use a list of possible class
values.
soup.findAll('div', {'class': ['name', 'number']})
After seeing your edit:
def grab_content(heading):
siblings = [s.contents[0] for s in heading.findNextSiblings()]
return {heading.contents[0]: siblings}
headings = soup.findAll('h4')
[grab_content(h) for h in headings]
And the output for your original HTML snippet would be:
[{u'BlockA': [u'John Smith', u'2', u'Paul Peterson', u'14']},
{u'BlockB': [u'Steve Jones', u'5']}]
精彩评论