开发者

Perceptual hash function for text [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 10 years ago.

Does anyone knows a simple perceptual hash algorithm for text ? I took a look in the p开发者_如何学GoHash function ph_texthash but I want a more simple algorithm. Preferably in Python. Thank you !


A blog post about perceptual hash functions (in the imaging context):

  • http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html

and some related python code (dealing with images, not text, but may be adaptable):

  • http://sprunge.us/WcVJ?py (53 LOC)

As I understand this short presentation about Perceptual Hashing of Textual Content, there are numerous approaches (in different dimensions such as the level of the text, linguistic or statistical approach, the model chosen to represent the text, ...), and the right one will depend on your domain and the problems you try to solve.

Also you might look into Locality-sensitive hashing, which

is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜