How to retrieve these elements from a webpage?
I have a webpage in HTML with these elements:
<div class="content_page">
<a href="/earth" class="nametessera" >earth</a>
</div>
<div class="content_page">
<a href="/world" class="nametessera" >world</a>
</div>
<div class="content_page">
<a href="/planet" class="nametessera">planet开发者_StackOverflow</a>
</div>
...
I need to retrieve /earth, /world, /planet, etc. so I need to retrieve all links of tag A with class "nametessera".
How can I do this with python ?
Short answer:
Use beautifulSoup to parse the page, get the urls and then use urlib2 or pycurl to download the mentioned urls.
[Edit:]
Adding on to the examples below but to only use the the href contained in the div
>>> alldiv = soup.findAll('div', { "class" : "content_page" })
>>> for div in alldiv: print div.a
...
<a href="/earth" class="nametessera">earth</a>
<a href="/world" class="nametessera">world</a>
<a href="/planet" class="nametessera">planet</a>
>>> for div in alldiv: print div.a['href']
...
/earth
/world
/plan
Similarly you could also do
allHref = soup.findAll('a', { "class" : "nametessera" })
You parse the HTML with Beautiful Soup.
The documentation is here.
精彩评论