Suggestive Spelling Checker based on edit distance and lcs?
How can i implement a simple spell checker that accepts a misspelled word 开发者_StackOverflow社区and an edit distance threshold then produce a list of suggested correct words . That is wanted to be achieved using an algorithm that 1-use both edit distance and longest common subsequence 2-don't calculate the edit distance for each word in dictionary???????
So you have a dictionary of words and you want to use edit distance to calculate the closest match to a given word.
Some suggestions to shortcut the process of checking all possible options:
- Cache the closest matches to words when you do the calculations. If someone enters "speling" and your top matches are "spelling", "spewing", and "spilling", save these matches with their calculated distances and threshold. Next time you see "speling", you can just retrieve any results where the threshold <= the new threshold.
- With levenshtein distance, when calculating, you can discard any words where the length difference is greater than the threshold. You should be able to shortcut that process. Of course if you want common subsequences, this falls over.
- Modify the levenshtein distance calculator to break as soon as the threshold is reached. You'll still be starting to check a lot of words that don't match, but you'll do less work by breaking early.
In case you're still after a levenshtein distance algorithm, have a look at this example. It's pretty fast.
http://dotnetperls.com/levenshtein
Peter Norvig did this in Python, and someone ported his spellchecker to C#
http://norvig.com/spell-correct.html
http://www.codegrunt.co.uk/?page=cSharp#norvigSpell
精彩评论