Close urllib2 connection
I'm using urllib2 to load files from ftp- and http-servers.
Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.
from urllib2 import urlopen
from time import sleep
url = 'ftp://user:pass@host/big_file.ext'
def load_file(url):
f = urlopen(url)
loaded = 0
while True:
data = f.read(1024)
if data == '':
break
loaded += len(data)
f.close()
#sleep(1)
print('loaded {0}'.format(loaded))
load_file(url)
load_file(url)
The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:
loaded 463675266
Traceback (most recent call last):
File "conection_test.py", line 20, in <module>
load_file(url)
File "conection_test.py", line 7, in load_file
f = urlopen(url)
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
File "/usr/lib/python2.6/urllib.py", line 854, in __init__
self.init()
File "/usr/lib/python2.6/urllib.py", line 860, in init
self.ftp.connect(开发者_StackOverflowself.host, self.port, self.timeout)
File "/usr/lib/python2.6/ftplib.py", line 134, in connect
self.welcome = self.getresp()
File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>
So the first file is loaded and the second fails because the first connection was not closed.
But when i use sleep(1)
after f.close()
the error does not occurr:
loaded 463675266
loaded 463675266
Is there any way to force close the connection so that the second download would not fail?
The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython. A colleague proposed this sollution:
fdurl = urllib2.urlopen(req,timeout=self.timeout) realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later req = urllib2.Request(url, header) try: fdurl = urllib2.urlopen(req,timeout=self.timeout) except urllib2.URLError,e: print "urlopen exception", e realsock.close() fdurl.close()
The fix is ugly, but does the job, no more "too many open connections".
Biggie: I think it's because the connection is not shutdown().
Note close() releases the resource associated with a connection but does not necessarily close the connection immediately. If you want to close the connection in a timely fashion, call shutdown() before close().
You could try something like this before f.close():
import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)
(And yes.. if that works, it's not Right(tm), but you'll know what the problem is.)
as for Python 2.7.1 urllib2 indeed leaks a file descriptor: https://bugs.pypy.org/issue867
Alex Martelli answers to the similar question. Read this : should I call close() after urllib.urlopen()?
In a nutshell:
import contextlib
with contextlib.closing(urllib.urlopen(u)) as x:
# ...
精彩评论