开发者

How to deal with deflated response by urllib2? [duplicate]

This question already has answers here: Python: Inflate and Deflate implementations (2 answers) Closed 4 years ago.

I currently use following code to decompress gzipped response by urllib2:

opener = urllib2.build_opener()
response = opener.open(req)
data = response.read()
if response.headers.get('content-encoding', '') == 'gzip':
    data = StringIO.StringIO(data)
    gzipper = gzip.GzipFile(fileobj=data)
    html = gzipper.read()

Does it handle deflated response too or do I need to write sepera开发者_高级运维te code to handle deflated response?


You can try

if response.headers.get('content-encoding', '') == 'deflate':
    html = zlib.decompress(response.read())

if fail, here is another way, I found it in requests source code,

if response.headers.get('content-encoding', '') == 'deflate':
    html = zlib.decompressobj(-zlib.MAX_WBITS).decompress(response.read())


There is a better way outlined at:

  • http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompression/

The author explains how to decompress chunk by chunk, rather than all at once in memory. This is the preferred method when larger files are involved.

Also found this helpful site for testing:

  • http://carsten.codimi.de/gzip.yaws/


To answer from above comment, the HTTP spec (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3) says:

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.

I take that to mean it should use identity. I've never seen a server that doesn't.


you can see the code in urllib3

class DeflateDecoder(object):

    def __init__(self):
        self._first_try = True
        self._data = binary_type()
        self._obj = zlib.decompressobj()

    def __getattr__(self, name):
        return getattr(self._obj, name)

    def decompress(self, data):
        if not data:
            return data

        if not self._first_try:
            return self._obj.decompress(data)

        self._data += data
        try:
            return self._obj.decompress(data)
        except zlib.error:
            self._first_try = False
            self._obj = zlib.decompressobj(-zlib.MAX_WBITS)
            try:
                return self.decompress(self._data)
            finally:
                self._data = None


class GzipDecoder(object):

    def __init__(self):
        self._obj = zlib.decompressobj(16 + zlib.MAX_WBITS)

    def __getattr__(self, name):
        return getattr(self._obj, name)

    def decompress(self, data):
        if not data:
            return data
        return self._obj.decompress(data)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜