Is any substring of a hash (md5, sha1) more "random" than another?
Here's 3 example md5 hashes
$ md5 -s "1" && md5 -s "2" && md5 -s "3"
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
MD5 ("2") = c81e728d9d4c2f636f067f89cc14862c
MD5 ("3") = eccbc87e4b5ce2fe28308fd9f2a7baf3
Say I wanted to take 8 characters from any hash. Is the beginning part of the hash particularly more "random" than the end? middle开发者_JS百科? Or are all substrings equally "random"?
I was curious myself, so I went ahead and wrote a program to test this. You'll need Crypto++ to compile the code.
Disclaimer: When it comes to cryptography, or even just mathematics in general, I know just enough to shoot myself in the foot. So, take the following results with a grain of salt and keep in mind that I only have a cursory knowledge of the tools I'm using.
I only sampled three substrings: the first 8 bytes, the middle 8 bytes, and the last 8 bytes. Long story short, they're equally random.
However, when using a smaller sample space, it appears as if the last 8 bits are slightly more random. The larger the sampling space, the closer all three substrings approach complete randomness.
1000 iterations:
First: 0.995914
Middle: 0.996546
Last: 0.998104
5000 iterations:
First: 0.998387
Middle: 0.998624
Last: 0.999501
10000 iterations:
First: 0.999614
Middle: 0.999457
Last: 1
30000 iterations:
First: 1
Middle: 1
Last: 1
"Randomness" is measured by Crypto++'s MaurerRandomnessTest class. For reference, the executable compiled from the above code has a randomness value of 0.632411
and a copy of Shakespeare's Macbeth downloaded from Project Gutenburg has a randomness value of 0.566991
.
Nitpick: "random" is the wrong word to use here, since hash functions are deterministic.
As for answering what you mean :), a desirable property of hash functions is achieving the Avalanche effect: basically, to have every bit of input cause drastic changes to the output. So, for a well-designed hash, every substring should be affected equally often ("be as random") as any other.
All substrings of a good hash (and md5 is reasonably good despite being cryptographically unsafe) are equally random, so yes, take any bits you like from the string, they should be equally distributed.
Measuring the randomness of the output of a hash function can be done using Statistical tests done on pseudo-random number generators. According to the Handbook of Applied Cryptography §5.4.4 (sample chapters available for free), there are five basic tests:
- Frequency test (monobit test)
- Serial test (two-bit test)
- Poker test
- Runs test
- Autocorrelation test
Then, of course, there's the Maurer's universal statistical test that kurige has already mentioned.
精彩评论