Need urllib.urlretrieve and urllib2.OpenerDirector together
I'm writing a script in Python 2.7 which uses a urllib2.OpenerDirector
instance via urllib2.build_opener()
to take advantage of the urllib2.HTTPCookieProcessor
class, because I need to store and re-send the cookies I get:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
However开发者_StackOverflow中文版, after making several requests and moving the cookies around, eventually I need to retrieve a list of URLs. I wanted to use urllib.urlretrieve()
because I read it downloads the file in chunks, but I cannot because I need to carry my cookies on the request and
urllib.urlretrieve()
uses a urllib.URLOpener
, which doesn't have support for cookie handlers like OpenerDirector
has.
What's the reason of this strange way of splitting functionality, and how can I achieve my goal?
urlretrieve
is a old interface from urllib
. It was there much before urllib2 came into existence. It does not have any session handling capabilities. It just downloads the files. The updated urllib2
provides much better way with the deal with sessions, passwords, proxies extra using its Handler interfaces OpenerDirector class. In order to just download the urls as files, you may just use the urlopen call of urllib2 using the same request object that you created. This will maintain the session.
精彩评论