How much data (many MB) can I uniquely identify using MD5

2023-01-28 08:26 问答作者：

I've got millions of data records that are each about 2MB in size. Every one of these pieces of data are stored in a file and there is a set of other data associated with that record (stored in a database).

When my program runs I'll be presented, in memory, with one of the data records and need to produce the associated data. To do this I'm ima开发者_C百科gining taking an MD5 of the memory, then using this hash as a key into the database. The key will help me locate the other data.

What I need to know is if an MD5 hash of the data contents is a suitable way to uniquliy identify a 2MB piece of data, meaning can I use an MD5 hash without worrying too much about collisions?

I realize there is a chance for collision, my concern is how likely is the chance for collision on millions of 2MB data records? Is collision a likely occurrence? What about when compared to hard disk failure or other computer failures? How much data can MD5 be used to safely identify? what about millions of GB files?

I'm not worried about malice or data tampering. I've got protections such that I wont be receiving manipulated data.

This boils down to so-called Birthday paradox. That Wikipedia page has simplified formulas for evaluating the collision probability. It will be very some very small number.

The next question is how you deal with say 10^-12 collision probability - see this very similar question.

继续阅读：hash security

How much data (many MB) can I uniquely identify using MD5

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？