Python - BeautifulSoup - HTML Parsing
Here is fragment of the site code
<td class='vcard' id='results100212571'>
<h2 class="custom_seeMore">
<a class="fn openPreview" href="link.html">Hotel Name<span class="seeMore">See More...</span></a>
</h2>
<div class='clearer'></div>
<div class='adr'>
<span class='postal-code'>00000</span>
<span class='locality'>City</span>
<span class='street-address'>Address</span>
</div>
<p class="tel">Phone number</p>
and I try to parse it
for element in BeautifulSoup(page).findAll('td'):
if element.find('a', {'class' : 'fn openPreview'}):
print element.find('a', {'class' : 'fn openPreview'}).string
if element.find('span', {'class' : 'postal-code'}):
print element.find('span', {'class' : 'postal-code'}).string
if element.find('span', {'class' : 'locality'}):
print element.find('span', {'class' : 'locality'}).string
if element.find('span', {'class' : 'street-address'}):
print element.find('span', {'class' : 'street-address'}).string
if element.find('p', {'class' : 'tel'}):
print element.find('p', {'class' : 开发者_开发技巧'tel'}).string
I know it's very amateur code, but it almost works. ie it works for all classes except 'fn openPreview', all other classes draw their content, but
print element.find('a', {'class' : 'fn openPreview'}).string
print None
Please help me, how to parse it.
According to the BeautifulSoup documentation, element.string
will be None
if element
has multiple children.
In your case,
print element.find('a', {'class' : 'fn openPreview'}).contents[0].string
will print "Hotel Name".
精彩评论