check inputted string from a file contains of allowable words
I'm starting to write a program here to check the inputted word/s by user whether correct or not then the program will have the capability to correct it from point to point letter/s by letter/s. Able to move letter by this point to that point just to correct the word that depends on the list of words from a .txt file.
e.g. input:
"tihs is nto a corerct sentnece" (this is not a correct sentence)
If the user has inputted a wrong word/s the program will scan the .txt file then find the most near corr开发者_如何学运维ect word just to correct the wrong inputted word then the program has the capability to correct it and output the correct sentence like:
"this is not a correct sentence" from (tihs is nto a corerct sentnece)
Every incorrect word/s will be scanned based on the .txt file.
My question is, how am I going to start coding for this stuff? thanks...
From "How to write a spelling corrector" by Peter Norvig:
The full details of an industrial-strength spell corrector like Google's would be more confusing than enlightening, but I figured that on the plane flight home, in less than a page of code, I could write a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.
Peter Norvig is a very talented computer scientist, and a great explainer, so I highly recommend his blog.
First thing, you obviously need to find words spelled incorrectly. Next, you should determine a way of choosing a value for words that are possibly correct. I.e. "folor" could be "floor" with jumbled letters or "color" with a 'f' as opposed to 'c' and so on. In this case, both words are really close: two mixed up letters and a character replacing another character close to it on the keyboard. You would have to assign each of these values based off what you think is a more common mistake. In general, you could put each word with a low value into a Priority Queue and then pull from there. However, if the only case is the one described (swapped letters) then it is a little easier in terms of your sample size, but you would still have to assign a value to each word.
Note: nto could also be fixed to ton. If you wish to get rid of this possibility, you would have to check grammar as well.
精彩评论