Encrypting a file with RSA in Python
I'm implementing file encryption with RSA, using PyCrypto.
I know it's somewhat wrong, first of all because RSA is very slow and second because PyCrypto RSA can only encrypt 128 characters, so you have to explode the file in 128 characters chunks.
This is the code so far:
from Crypto.PublicKey import RSA
file_to_encrypt = open('my_file.ext', 'rb').read()
pub_key = open('my_pub_key.pem', 'rb').read()
o = RSA.importKey(pub_key)
to_join = []
step = 0
while 1:
# Read 128 characters at a time.
s = file_to_encrypt[step*128:(step+1)*128]
if not s: break
# Encrypt with RSA and append the result to list.
# RSA encryption returns a tuple containing 1 string, so i fetch the string.
to_join.append(o.encrypt(s, 0)[0])
step += 1
# Join the results.
# I hope the \r\r\r sequence won't appear in the encrypted result,
# when i explode the string back for decryption.
encrypted = '\r\r\r'.join(to_join)
# Write the encrypted file.
open('encrypted_file.ext', 'wb').write(encrypted)
So my question is : are there any better methods for using Private/ Public key encryption ON FILES ?
I heard about Mcrypt and OpenSSL, but I don't know if they can encrypt file开发者_如何学运维s.
Public-key cryptography is usually used for small amounts of data only. It is slow, and can be hard to use right. The usual practice is to use other methods to reduce the asymmetric problem to one where the security is provided by a shared key, then use public-key cryptography to protect that shared key. For example:
- To encrypt a file, randomly generate a secret key for a block or stream cipher (e.g. AES). Store the data encrypted with this cipher, and store the secret key encrypted with the public key alongside the encrypted payload.
- To sign a file, compute a cryptographic digest (e.g. SHA-256). Sign the digest of the file with the private key and store that alongside the file.
So here's a sketch of how encryption can look like (warning, untested code, typed directly in the browser):
import os
from Crypto.Cipher import AES
from Crypto.PublicKey import RSA
import Crypto.Util.number
def encrypt_file(rsa, input, output):
# Generate secret key
secret_key = os.urandom(16)
# Padding (see explanations below)
plaintext_length = (Crypto.Util.number.size(rsa.n) - 2) / 8
padding = '\xff' + os.urandom(16)
padding += '\0' * (plaintext_length - len(padding) - len(secret_key))
# Encrypt the secret key with RSA
encrypted_secret_key = rsa.encrypt(padding + secret_key, None)
# Write out the encrypted secret key, preceded by a length indication
output.write(str(len(encrypted_secret_key)) + '\n')
output.write(encrypted_secret_key)
# Encrypt the file (see below regarding iv)
iv = '\x00' * 16
aes_engine = AES.new(secret_key, AES.MODE_CBC, iv)
output.write(aes_engine.encrypt(input.read()))
The iv
is an initialization vector for the CBC mode of operation. It needs to be unique per key per message. Normally, it's sent alongside the data in cleartext. Here, since the key is only ever used once, you can use a known IV.
The API of the block cipher is described in PEP 272. Unfortunately, it only supports all-at-once encryption. For large files, it would be better to encrypt chunk by chunk; you can encrypt as little as a block at a time (16 bytes for AES), but you need a better crypto library for that.
Note that in general, you should not directly encrypt data with RSA. The most obvious concern is that the attacker knows the public key and can therefore attempt to guess the plaintext (if the attacker thinks the plaintext may be swordfish
, then the attacker can encrypt swordfish
with the RSA public key, and compare the result with the output of the RSA encryption). Another concern which would arise if you wanted to send the file to multiple recipients is that if the RSA encryption step is deterministic, then the attacker can tell that the plaintexts are the same because the ciphertexts are the same. The normal defense against these problems is to use a padding scheme, which consists of adding some random secret data to the plaintext; this data is called padding. The attacker then cannot guess the random data, and sees different outcomes for every encryption because the same plaintext is never encrypted twice; as far as the legitimate recipient is concerned, the padding is just data that can be thrown away.
Here, it may appear that the above concerns do not apply in this scenario. However, there are other weaknesses that can arise from using RSA unprotected. In particular, if the public exponent is very small (not the case here as PyCrypto uses 65537) or you encrypt the same material for many different recipients (again, probably not the case here since each message has its own secret key), then a simple mathematical calculation would allow the attacker to recover the RSA plaintext. To avoid this attack, the value that is encrypted with RSA needs to be “close enough” to the RSA modulus, so that the encryption operation actually performs a modular exponentiation. The padding I propose ensures that by making the highest-order byte that fits 0xff; this is believed to be safe, although in the real world you should used an approved padding mode (OAEP).
精彩评论