How to deal with deflated response by urllib2? [duplicate]
I currently use following code to decompress gzipped response by urllib2:
opener = urllib2.build_opener()
response = opener.open(req)
data = response.read()
if response.headers.get('content-encoding', '') == 'gzip':
data = StringIO.StringIO(data)
gzipper = gzip.GzipFile(fileobj=data)
html = gzipper.read()
Does it handle deflated response too or do I need to write sepera开发者_高级运维te code to handle deflated response?
You can try
if response.headers.get('content-encoding', '') == 'deflate':
html = zlib.decompress(response.read())
if fail, here is another way, I found it in requests source code,
if response.headers.get('content-encoding', '') == 'deflate':
html = zlib.decompressobj(-zlib.MAX_WBITS).decompress(response.read())
There is a better way outlined at:
- http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompression/
The author explains how to decompress chunk by chunk, rather than all at once in memory. This is the preferred method when larger files are involved.
Also found this helpful site for testing:
- http://carsten.codimi.de/gzip.yaws/
To answer from above comment, the HTTP spec (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3) says:
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
I take that to mean it should use identity. I've never seen a server that doesn't.
you can see the code in urllib3
class DeflateDecoder(object):
def __init__(self):
self._first_try = True
self._data = binary_type()
self._obj = zlib.decompressobj()
def __getattr__(self, name):
return getattr(self._obj, name)
def decompress(self, data):
if not data:
return data
if not self._first_try:
return self._obj.decompress(data)
self._data += data
try:
return self._obj.decompress(data)
except zlib.error:
self._first_try = False
self._obj = zlib.decompressobj(-zlib.MAX_WBITS)
try:
return self.decompress(self._data)
finally:
self._data = None
class GzipDecoder(object):
def __init__(self):
self._obj = zlib.decompressobj(16 + zlib.MAX_WBITS)
def __getattr__(self, name):
return getattr(self._obj, name)
def decompress(self, data):
if not data:
return data
return self._obj.decompress(data)
精彩评论