HashAlgorithm implementation
I'm in need of a hash function for checking some file versioning (Basically, checking if client side file is the same as server side one).
My problem is that there is half a dozen implementations of HashAlgorithm
, in the .net library, and I'm a bit lost.
- System.Security.Cryptography.KeyedHashAlgorithm
- System.Security.Cryptography.MD5
- System.Security.Cryptography.RIPEMD160
- System.Security.C开发者_C百科ryptography.SHA1
- System.Security.Cryptography.SHA256
- System.Security.Cryptography.SHA384
- System.Security.Cryptography.SHA512
I'm looking for a fast algorithm, with relatively short output size. Security is not really a matter here.
Thanks!
Since it isn't a security issue, MD5 will probably serve your purposes. It is pretty standard for file content hashing.
From all of above, MD5 is the simplest and fastest one.
BTW. for the problem you've described, you don't need cryptographic hash function, any hash function will do. So you might use checksums, like for example CRC32 (or faster one -- Adler32).
For performance, measure. All hash functions are "fast" -- for some notion of speed. Among those you list, MD5 is the fastest, but this does not mean that the other are not "fast enough". The slowest should be SHA-512 with a managed implementation on a 32-bit VM (with a 64-bit VM, SHA-512 gets quite a boost, and SHA-256 becomes the slowest); it should still be able to process something like 30 Mbytes worth of data per second, on a common PC, which is not ultimately slow either.
When in doubt, use SHA-256. Consider something else only if you duly demonstrate, in an actual experiment, that hashing speed is a bottleneck for your application and you can show that you really do not have a security issue with a cryptographically broken hash function. This is the proper order of things, because assessing performance is way easier than assessing security, so it is much safer to go for the good security first. There again, apart from choosing MD5 as a faster function, you could also imagine importing a managed MD4 implementation (there is one there): MD4 is even more broken than MD5, but is also even faster. And/or you could try a bit of native code (on hash function implementations, native code is typically 2 to 4 times faster than managed code).
If you need a shorter output you can simply truncate. This mechanically lowers security so you should do that only if your usage of the hash function is not security related.
Well MD5 is fast but is strongly discouraged these days.
There is a tendency to be lax about security ("I'm not bothered about security so much" - I've done it, we all do), but if you really do not need any security then go for MD5.
Otherwise look at the SHA algorithms. SHA-1 is used a lot. I'm no crypto expert but I think the others have longer block sizes and are probably a bit slower. Some reading on the differences can be found at: http://en.wikipedia.org/wiki/SHA-1 and pages linked from this.
Note, an effective way of shortening a hash just for comparison purposes (eg. to check if files are the same and have not been tampered with), is to take a subset of the chars from the hash. Just make sure you take them from the same indexes each time (eg. chars 0-5; or chars 5,11, and 13 you get the idea).
精彩评论