zlib.error: Error -3 while decompressing: incorrect header check

2023-01-05 00:49 问答作者：

I have a gzip file and I am trying to read it via Python as below:

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decom开发者_如何转开发press(cdata)

it throws this error:

zlib.error: Error -3 while decompressing: incorrect header check

How can I overcome it?

You have this error:

zlib.error: Error -3 while decompressing: incorrect header check

Which is most likely because you are trying to check headers that are not there, e.g. your data follows RFC 1951 (deflate compressed format) rather than RFC 1950 (zlib compressed format) or RFC 1952 (gzip compressed format).

choosing windowBits

But zlib can decompress all those formats:

to (de-)compress deflate format, use wbits = -zlib.MAX_WBITS
to (de-)compress zlib format, use wbits = zlib.MAX_WBITS
to (de-)compress gzip format, use wbits = zlib.MAX_WBITS | 16

See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2)

examples

test data:

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>

obvious test for zlib:

>>> zlib.decompress(zlib_data)
'test'

test for deflate:

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

test for gzip:

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

the data is also compatible with gzip module:

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

automatic header detection (zlib or gzip)

adding 32 to windowBits will trigger header detection

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

using `gzip` instead

or you can ignore zlib and use gzip module directly; but please remember that under the hood, gzip uses zlib.

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

Update: dnozay's answer explains the problem and should be the accepted answer.

Try the gzip module, code below is straight from the python docs.

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

I just solved the "incorrect header check" problem when uncompressing gzipped data.

You need to set -WindowBits => WANT_GZIP in your call to inflateInit2 (use the 2 version)

Yes, this can be very frustrating. A typically shallow reading of the documentation presents Zlib as an API to Gzip compression, but by default (not using the gz* methods) it does not create or uncompress the Gzip format. You have to send this non-very-prominently documented flag.

This does not answer the original question, but it may help someone else that ends up here.

The zlib.error: Error -3 while decompressing: incorrect header check also occurs in the example below:

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

The example is a minimal reproduction of something I encountered in some legacy Django code, where Base64 encoded bytes (from an HTTP POST) were being stored in a Django CharField (instead of a BinaryField).

When reading a CharField value from the database, str() is called on the value, without an explicit encoding, as can be seen in the Django source.

The str() documentation says:

If neither encoding nor errors is given, str(object) returns object.str(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a str() method, then str() falls back to returning repr(object).

So, in the example, we are inadvertently base64-decoding

"b'eJxLTEpOSQUABcgB8A=='"

instead of

b'eJxLTEpOSQUABcgB8A=='.

The zlib decompression in the example would succeed if an explicit encoding were used, e.g. str(b64_encoded_bytes, 'utf-8').

NOTE specific to Django:

What's especially tricky: this issue only arises when retrieving a value from the database. See for example the test below, which passes (in Django 3.0.3):

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

where MyModel is

class MyModel(models.Model):
    data = models.CharField(max_length=100)

To decompress incomplete gzipped bytes that are in memory, the answer by dnozay is useful but it misses the zlib.decompressobj call which I found to be necessary:

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

Note that zlib.MAX_WBITS | 16 is 15 | 16 which is 31. For some background about wbits, see zlib.decompress.

Credit: answer by Yann Vernier which notes the the zlib.decompressobj call.

Funnily enough, I had that error when trying to work with the Stack Overflow API using Python.

I managed to get it working with the GzipFile object from the gzip directory, roughly like this:

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

My case was to decompress email messages that are stored in Bullhorn database. The snippet is the following:

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

Just add headers 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/requests/issues/3849

继续阅读：python zlib

zlib.error: Error -3 while decompressing: incorrect header check

choosing windowBits

examples

automatic header detection (zlib or gzip)

using `gzip` instead

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

choosing windowBits

examples

automatic header detection (zlib or gzip)

using gzip instead

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

using `gzip` instead

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？