开发者

Need urllib.urlretrieve and urllib2.OpenerDirector together

I'm writing a script in Python 2.7 which uses a urllib2.OpenerDirector instance via urllib2.build_opener() to take advantage of the urllib2.HTTPCookieProcessor class, because I need to store and re-send the cookies I get:

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))

However开发者_StackOverflow中文版, after making several requests and moving the cookies around, eventually I need to retrieve a list of URLs. I wanted to use urllib.urlretrieve() because I read it downloads the file in chunks, but I cannot because I need to carry my cookies on the request and urllib.urlretrieve() uses a urllib.URLOpener, which doesn't have support for cookie handlers like OpenerDirector has.

What's the reason of this strange way of splitting functionality, and how can I achieve my goal?


urlretrieve is a old interface from urllib. It was there much before urllib2 came into existence. It does not have any session handling capabilities. It just downloads the files. The updated urllib2 provides much better way with the deal with sessions, passwords, proxies extra using its Handler interfaces OpenerDirector class. In order to just download the urls as files, you may just use the urlopen call of urllib2 using the same request object that you created. This will maintain the session.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜