urllib2 and cookielib thread safety
As far as I've been able to tell cookielib isnt thread safe; but then again the post stating so is five years old, so it might be wrong.
Nevertheless, I've been wondering - If I spawn a class like this:
class Acc:开发者_如何转开发
jar = cookielib.CookieJar()
cookie = urllib2.HTTPCookieProcessor(jar)
opener = urllib2.build_opener(cookie)
headers = {}
def __init__ (self,login,password):
self.user = login
self.password = password
def login(self):
return False # Some magic, irrelevant
def fetch(self,url):
req = urllib2.Request(url,None,self.headers)
res = self.opener.open(req)
return res.read()
for each worker thread, would it work? (or is there a better approach?) Each thread would use it's own account; so the fact that workers wouldn't share their cookies is not a problem.
You want to use pycurl (the python interface to libcurl). It's thread-safe, supports cookies, https, etc.. The interface is a bit strange, but it just takes a bit of getting used to.
I've only used pycurl w/ HTTPBasicAuth + SSL, but I did find an example using pycurl and cookies here. I believe you'll need to update the pycurl.COOKIEFILE (line 74) and pycurl.COOKIEJAR (line 82) to have some unique name (maybe keying off of id(self.crl)
).
As I remember, you'll need to create a new pycurl.Curl()
for each request to maintain thread safety.
You could see implementation of the library [python_install_path]/lib/cookielib.py
to ensure that cookielib.CookieJar
is thread safe.
It means if you will share one instance of CookieJar
between several connections in different threads, you will not face even inconsistence read of Cookie Set, because CookieJar
uses lock self._cookies_lock
inside.
the same question as you. If you do not use pycurl, I think you must urllib2.install_opener(self.opener) before each urllib2.urlopen.
Maybe I should use the pycurl too, urllib2 is not so smart.
精彩评论