开发者

Is there an API or algorithm that matches strings in a block of text and accounts for possible misspellings?

I'm looking for a solution that could process large blocks of user input text and match it against a set of strings that I've got stored in a database. The only problem is that the strings in the user input text are frequently misspelled. (The strings in the database are spelled correctly)

I know modern search engines suggest results that account for misspellings, but I have not a clue what those开发者_如何学C algorithms are called or if they even apply to my situation.

Firstly, I need to know the names of those algorithms (or what they are generally called). Secondly, I need to know how to apply them. Any ideas?


Use libaspell to find misspelled words, then correct it's suggestions with some clustering (k-means ?) algo, or with http://en.wikipedia.org/wiki/Levenshtein_distance (for strings). Your code should also process incomplete non-dictionary words, if you have a parts catalog or scientific book database to search in.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜