Is there an API or algorithm that matches strings in a block of text and accounts for possible misspellings?
I'm looking for a solution that could process large blocks of user input text and match it against a set of strings that I've got stored in a database. The only problem is that the strings in the user input text are frequently misspelled. (The strings in the database are spelled correctly)
I know modern search engines suggest results that account for misspellings, but I have not a clue what those开发者_如何学C algorithms are called or if they even apply to my situation.
Firstly, I need to know the names of those algorithms (or what they are generally called). Secondly, I need to know how to apply them. Any ideas?
Use libaspell to find misspelled words, then correct it's suggestions with some clustering (k-means ?) algo, or with http://en.wikipedia.org/wiki/Levenshtein_distance (for strings). Your code should also process incomplete non-dictionary words, if you have a parts catalog or scientific book database to search in.
精彩评论