When compressing and encrypting, should I compress first, or encrypt first? [closed]
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this questionIf I were to AES-encrypt a file, and then ZLIB-compress it, would the compression be less efficient than if I first compressed and then encrypted?
In other words, should I compress first or encrypt first, or does it matter?
Compress first. Once you encrypt the file you will generate a stream of seemingly random data, which will be not be compressible. The compression process depends on finding compressible patterns in the data.
Compression before encryption is surely more space efficient but in the same time less secure. That's why I would disagree with other answers.
Most compression algorithms use "magic" file headers and that could be used for statistical attacks.
For example, there is a CRIME SSL/TLS attack.
If your encryption algorithm is any good (and AES, with a proper chaining mode, is good), then no compressor will be able to shrink the encrypted text. Or, if you prefer it the other way round: if you succeed in compressing some encrypted text, then it is high time to question the quality of the encryption algorithm…
That is because the output of an encryption system should be indistinguishable from purely random data, even by a determined attacker. A compressor is not a malicious attacker, but it works by trying to find non-random patterns which it can represent with fewer bits. The compressor should not be able to find any such pattern in encrypted text.
So you should compress data first, then encrypt the result, not the other way round. This is what is done in, e.g., the OpenPGP format.
Compress first. If you encrypt then your data turns into (essentially) a stream of random bits. Random bits are incompressable because compression looks for patterns in the data and a random stream, by definition, has no patterns.
Of course it matters. It's generally better to compress first and then to encrypt.
ZLib uses Huffman coding and LZ77 compression. The Huffman tree will be more balanced and optimum if it's performed on plain text for instance and so the compression rate will be better.
Encryption can follow after compression even if the compression result appear to be "encrypted" but can easily be detected to be compressed because the file usually starts with PK.
ZLib don't provide encryption natively. That's why I've implemented ZeusProtection. The source code is also available at github.
From a practical perspective, I think you should compress first simply because many files are pre-compressed. For example, video encoding usually involves heavy compression. If you encrypt this video file then compress it, it has now been compressed twice. Not only will the second compression get a dismal compression ratio, but compressing again will take a great deal of resources to compress large files or streams. As Thomas Pornin and Ferruccio stated, compression of encrypted files may have little effect anyway because of the randomness of the encrypted files.
I think the best, and simplest, policy may be to compress files only-as-needed beforehand (using a whitelist or blacklist), then encrypt them regardless.
精彩评论