How to safely de-duplicate files encrypted at the client's side?
Bitcasa's claim its to provide infinite storage for a fixed fee.
According to a TechCrunch interview, Bitcasa uses client-side convergent encryption. Thus no unencrypted data ever reaches the server. Using convergent encryption, the encryption-key gets derived from the be encrypted source-data.
Basically, Bitcasa uses a hash function to identify identical files uploaded by different users to store them only once on their servers.
I wonder, how the provider is able to ensure, that no two different files get mapped to the same encrypted file or the same encrypted data stream, sinc开发者_如何学运维e hash functions aren't bijective.
Technical question: What do I have to implement, so that such a collision may never happen.
Most deduplication schemes make the assumption that hash collisions are so unlikely to happen that they can be ignored. This allows clients to skip reuploading already-present data. It does break down when you have two files with the same hash, but that's unlikely to happen by chance (and you did pick a secure hash function to prevent people from doing it intentionally, right?)
If you insist on being absolutely sure, all clients must reupload their data (even if it's already on the server), and once this data is reuploaded, you must check that it's identical to the currently-present data. If it's not, you need to pick a new ID rather than using the hash (and sound the alarm that a collision has been found in SHA1!)
精彩评论