What does "no collisions have been found yet for this hashing method" even mean?
I mean I don't need to look for the actual collisions, to know they exist. If there weren't collisions, then how would you have fixed-length results? That's why I don't understand what people mean when they claim 'md5 is insecure! some开发者_JS百科one found collisions!', or something like that.
The only thing I can think of, is that the collision search only looks for dictionary words, eg: If 'dog' and 'house' share the same hash, it would be a stupid hashing method IMO. It could also look for strings with a length < X, being X something between 5-10 (passwords that people could remember)
Am I totally wrong?
MD5 is a 128-bit hash, so there are 2^128 possible hashes. If the hash were perfect, then it would in theory require around 2^64 different hash attempts to find a collision (and you would have to store all 2^64 because each new hash would require comparison to all previous values). There isn't 2^64 bits of storage on the planet, so you would be safe.
The attacks on MD5 allow collisions to be found with significantly less than 2^64 hashes and significantly less than 128 x 2^64 bits of storage. That's why MD5 is considered broken.
Currently there are no similar attacks that work on full-strength SHA-1, but it's expected that such attacks will be publicly known within a few years.
As you know, a collision is the term for the situation where two different things (e.g. documents) hash to the same value.
Clearly, collisions are always theoretically possible for a secure hashing algorithm. But the security of secure hashing comes from:
- using a large domain of possible hash values, and
- using a hashing algorithm with the property that trial and error is close to the best way to produce a document with a given hash.
If both of these criteria are satisfied, then the probability of someone being able to manufacture a collision for a given document is vanishingly small. This is sufficient to make it impractical to (for example) change the content of a document with a digital signature.
The problem is that clever people have figured out a way (or ways) that are a LOT faster than trial and error for creating documents whose MD5 signatures collide. Hence they can defeat digital signatures, and similar uses of MD5 to provide security.
FOLLOWUP
This quote comes from the Wikipedia page on MD5:
MD5 makes only one pass over the data, so if two prefixes with the same hash can be constructed, a common suffix can be added to both to make the collision more likely to be accepted as valid data by the application using it. Furthermore, current collision-finding techniques allow to specify an arbitrary prefix: an attacker can create two colliding files that both begin with the same content. All the attacker needs to generate two colliding files is a template file with a 128-byte block of data aligned on a 64-byte boundary that can be changed freely by the collision-finding algorithm.
I don't completely understand this, but it looks like a recipe for producing files with (different) meaningful content and the same signature.
In practice, it's not about whether a single sample was found, but about a method. These can be either based on some property "if you hash values of length N, ending with ..., etc. you will get the same hash" (silly example), or based on some algorithm "having this hash / value, this is how you get a new value with the same hash".
Collisions will of course always exist, but the interesting problem is how to find them. I'm not sure what is the source of that claim you quoted, but I'm pretty sure it was supposed to actually mean "no practical way to find collisions has been found yet for this hashing method".
When you see "No collisions found" for the SHA-256 hash, for example, it really means that no hash collisions have ever been found. You are right that theoretically collisions exists, and there may already have happened a SHA-256 collision that no-one noticed, but this is irrelevant.
To find a collision by chance, you would need on average 18 quintillion of hash attempts for a MD5 hash, and 340 undecillion attempts for a SHA-256 hash, already accounting for the birthday problem.
As vy32 said, it is computationally unfeasible compute, store and compare so many hashes. So, in order to find a collision, you need a method that is many orders of magnitude faster than the random trial and error one. If there exists such a method for a secure hash, the hash is considered broken, at least in regards to general collision resistance.
So, to say "Someone found a collision in this xxxbit hash" is in fact synonymous of saying "A practical method of finding collisions was found for this hash, making it insecure". The alternative is a cosmically unlikely event, and would be reported in another way.
精彩评论