开发者

Hash function that maps similar inputs to similar outputs?

Is there a has开发者_Python百科h function where small changes in the input result in small changes in the output? For example, something like:

hash("Foo") => 9e107d9d372bb6826bd81d3542a419d6
hash("Foo!") => 9e107d9d372bb6826bd81d3542a419d7 <- note small difference


I wouldn't call this a hash because the point of a hash is exactly the opposite. However, with your stated goal of small changes in input producing small changes in output, I would look at using either a soundex function or the Ratcliff algorithm.


I would recommend the simhash, algorithm by Mark Manasse.


A trivial solution would be be to XOR all bytes module N. E.g. for a 64 bits hash, you'd XOR (input[0] ^ input[8] ^ input[16]) + 256*(input[1] ^ input[9] ^ input[17]) etc. So, "Foo" hashes to "Foo\0\0\0\0\0" and "Foo!" hashes to "Foo!\0\0\0\0".


Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data. LSH hashes input items so that similar items map to the same “buckets” with high probability:

https://en.wikipedia.org/wiki/Locality-sensitive_hashing

Also see: https://en.wikipedia.org/wiki/Perceptual_hashing

Here is a nice example of perceptual hashing on DNA sequences:

http://arxiv.org/pdf/1412.5517.pdf

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜