Locality Sensitive Hash Implementation? [closed]

2023-02-28 03:31 问答作者：

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussi开发者_JAVA百科on. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 10 years ago.

Are there any relatively simple to understand (and simple to implement) locality-sensitive hash examples in C/C++/Java/C#?

I'd like to learn more about the concept and so want to try an implementation on a few text files just to see how it works, so I don't need anything high-performance or anything... just an example of a hash function that returns similar hashes for similar inputs. I can learn more from it by example afterwards. :)

For strings you can use approximate matching algorithm.

Generate a random string
For all the strings compute their distance from that random shared string using an algorithm like http://www.dotnetperls.com/levenshtein

If the strings are equidistant from a reference string then chances are that they are similar to each other. And there you go you have a locality senitive hash implementation for strings.

You can create different hash buckets for a range of distances.

EDIT: You can try other variations of string distance. A simpler algorithm would just return no. of common characters between two strings.

Well there is an excellent into article at MSDN blogs here: http://blogs.msdn.com/b/spt/archive/2008/06/11/locality-sensitive-hashing-lsh-and-min-hash.aspx

Also there is at least once C++ library which you can inspect the source code of here: http://sourceforge.net/projects/lshkit/

There is also a Java Implementation on Hadoop. it does a good job on documents.

it's called LikeLike

Currently Likelike supports only Min-Wise independent permutations. Min-Wise independent permutations is applied to the recommendation of Google News

I realise you explicitly asked for C/C++/C#, but there is a Python port of the nilsimsa hash which might be easier to grok than other, larger libraries.

继续阅读：c hash locality-sensitive-hash

Locality Sensitive Hash Implementation? [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？