Data on the Frequency of Edit Operations Required to Correct a Misspelt Word

2022-12-31 15:47 问答作者：

Does anybody know of any data that relates to the frequency of the types of mistakes the people make when they misspell a word? I'm not referring to words themselves, but tje errors that are made by the typist. For example, I personally make transposition errors the most followed by deletion errors (that is, not including a letter I should), substitution errors and lastly, insertion errors. However, it would not surprise me to find out that typing a wrong letter (a substitution error, e.g., xat instead of cat) is more frequent t开发者_如何学编程han not including a letter.

My purpose is to be able to make best guesses at correcting a word when I only have the original user's input. The idea being that if one type of error is more frequent than others, then it's more likely that correcting a word via that type of operation is correct. I don't object to using a database of commonly misspelt words but I prefer an algorithmic solution to depending on a corpus--especially if it might be faster.

You could try using something like calculating the Levenshtein distance between the mistyped word and the words in a dictionary. I'm not sure that's what you want though.

继续阅读：corpus database spell-checking

Data on the Frequency of Edit Operations Required to Correct a Misspelt Word

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？