download a file with mechanize

2023-04-10 03:52 问答作者：

I have a browser instance that has opened a page . I would like to download and save down all the links ( they are PDFs ). Does someone know how to d开发者_如何学Co it ?

Thx

import urllib, urllib2,cookielib, re
#http://www.crummy.com/software/BeautifulSoup/ - required
from BeautifulSoup import BeautifulSoup

HOST = 'https://www.adobe.com/'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

req = opener.open( HOST + 'pdf' )
responce = req.read()

soup = BeautifulSoup( responce )
pdfs = soup.findAll(name = 'a', attrs = { 'href': re.compile('\.pdf') })
for pdf in pdfs:
    if 'https://' not in pdf['href']:
        url = HOST + pdf['href']
    else:
        url = pdf['href']
    try:
        #http://docs.python.org/library/urllib.html#urllib.urlretrieve
        urllib.urlretrieve(url)
    except Exception, e:
        print 'cannot obtain url %s' % ( url, )
        print 'from href %s' % ( pdf['href'], )
        print e
    else:
        print 'downloaded file'
        print url

May not be the answer you're looking for but I've used lxml and the requests libraries together for automated anchor fetching:

Relevant lxml examples http://lxml.de/lxmlhtml.html#examples (replace urllib with requests )

And the requests library homepage http://docs.python-requests.org/en/latest/index.html

It's not as compact as mechanize but does offer more control.

继续阅读：mechanize python

download a file with mechanize

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？