One way hash (not for crypto/security), use SHA256 (not MD5, SHA-1)?
On a new system, we require a one-way-hash to compute a digital signature from binary input (e.g., a kilobyte of text, or larger text-and-binary files). The need is similar to how Scons (build system) hashes command-lines and source files, and how Git (version control system) hashes files to compute a signature for storage/synchronization.
Recall that Scons uses MD5, and Git uses SHA-1.
While MD5 and SHA-1 have been "broken", neither Scons nor Git are using their hashes specifically for security (e.g., it's not to store passwords), so general practice still considers those algori开发者_如何学Pythonthms acceptable for that usage. (Of course, this is partially a rationalization due to legacy adoption.)
QUESTION: Would you use SHA256 (not MD5 nor SHA-1) for a (non-crypto/security) one-way hash in a new system?
The concerns are:
- MD5 and SHA-1 have a long history of adoption
- SHA256 is relatively new (not as much history), but seems to be currently recommended for new work (but "stronger" algorithm strength is not specifically required for my application)
- SHA256 is more time-expensive to compute
- SHA256 produces a longer key (these will be used as dir/file names, and stored within index files), but I suppose I could truncate the produced key (hash is less strong, but should be sufficient), or just assume storage is cheap and file systems can handle it.
I'd be particularly interested in an answer consistent with the Scons or Git communities saying, "We'll keep ours forever!" or "We want to move to a new hash as soon as practical!" (I'm not sure what their plans are?)
Yes, I would use SHA-256. SHA-256 had a lot more than security purposes in mind; in fact one of the reasons that SHA1 needed to be replaced was for the very reason you need a hash function. A hash algorithm produces a finite site output; while having an undetermined amount of input. Eventually there will be a collision. The larger the output; the less likely of a collision (when using a proper hash algorithm).
Git went with SHA1 because they use it as file names; and they wanted it to be small and compact. SHA256 produces a much larger digest; consuming more disk space and more data to transmit over the wire. This question specifically addresses what would happen if git were to encounter collisions.
To look at your points:
- SHA256 has been in the wild long enough that if there were problems; we should have seen them by now.
- It isn't "stronger" per-se, it's less likely to produce a collision (if that is your criteria for stronger; then yes it is stronger).
- SHA-256 is slower; yes. Much slower? Depends on what your needs are. For 95% of people; it's performance is acceptable assuming you're using a proper implementation.
- In general, truncating the hash of SHA2 is an okay thing to do.
The probability of a non-malicious collision is vanishingly small, even with MD5. Here is a thought experiment:
A well stuffed hard drive may have 1M files. For the experiment, imagine there are 10M files. Let's say that the world population is 10.000M persons, each with one computer, and every file is different.
We would be contending with a number of different files of 10E6 * 10E9 = 1E17, < 2^57
The probability of an MD5 collision in such a far fetched case would be less than 1 in 2^71, or less than one in aproximately 2E21! To put this in perspective, for a collision probability of 1 in 10M we would have to repeat the experiment roughly 2E14 times, which is to say replacing every file, every hour since the big bang, and then keep going for a few more billion years.
2E14 / 24 / 365 / 13500E6 = 1.69
Of course, with SHA1 or SHA256, the probabilities would be even smaller, and there would also be resistance to a malicious attack -- unlike MD5, it would not be possible (now) that someone constructed files purposely for having the same hash.
Depends on what you're doing. It takes a lot longer to compute SHA-256 hash. Not a big deal for many applications, but what if your app were trying to compute hundreds or thousands per minute? You might find the additional time is too much.
On the flip-side though, SHA-1 has a much higher chance of a collision than SHA-256. Understand though that such chances are miniscule (1 in 2^160/2 I think for SHA-1) and likely would never be hit by most apps. However, the more things you hash, the higher the chance. If you're hashing millions or billions of things, this might be a concern.
For increased security (however this might be defined) with reduced opportunities for attackers or accidents you might want to consider salting or using keyed (HMAC) variants. Also small tricks like Git's prefix which includes the message length or a CRC can make it harder for an attacker to device a message not only having the same hash, but also a valid format.
You can also think about larger hashes like the trees used by Glacier (Amazon) or Branch Cache Hash (Microsoft) or some peer-to-peer networks like BitTorrent or other Merkle or Tiger Tree based constructs.
精彩评论