Place dots where a word is misspelled

2022-12-15 00:13 问答作者：

I'm creating a web app in PHP where people can try to translate words they need to learn for school.

For example, someone needs to translate the Dutch word 'weer' to 'weather' in English, but unfortunately he types 开发者_StackOverflow中文版'whether'. Because he almost typed the right word, I want to give him another try, with dots '.' on the places where he made a mistake:

Language A:   weer
Language B:   weather
Input:        whether
Output:       w..ther

Or, for example

Language A:   fout
Language B:   mistake
Input:        mitake
Output:       mi.take

Or:

Language A:   echt
Language B:   genuine
Input:        genuinely
Output:       genuinely (almost good, shorten the word a little bit)

But, if the input differs too much from the desired translation, I don't want to get output like ........

I heard about Levenshtein distance and I think I need an algorithm that is much like that one, but I don't know how to place dots on the right place instead of echoing how many operations are to be done.

So, how can I return the misspelled word with dots on the places where someone made a mistake?

First, take a look at the Levenshtein algorithm on wikipedia. Then go on and look at the examples and the resulting matrix d on the article page:

                *k*     *i*     *t*     *t*     *e*     *n*
        >0       1       2       3       4       5       6 
*s*      1      >1       2       3       4       5       6 
*i*      2       2      >1       2       3       4       5 
*t*      3       3       2      >1       2       3       4 
*t*      4       4       3       2      >1       2       3 
*i*      5       5       4       3       2      >2       3 
*n*      6       6       5       4       3       3      >2 
*g*      7       7       6       5       4       4      >3

The distance is found on lower right corner of the matrix, d[m, n]. But from there it is now possible to follow backtrack the minimum steps to the upper left of of the matrix, d[1, 1]. You just go left, up-left, or up at each step whichever minimizes the path. In the example above, you'd find the path marked by the '>' signs:

  s i t t i n g      k i t t e n
0 1 1 1 1 2 2 3    0 1 1 1 1 2 2 3
  ^       ^   ^      ^       ^   ^
  changes in the distance, replace by dots

Now you can find on the minimum path at which locations d[i,j] the distance changes (marked by ^ in the example above), and for those letters you put in the first (or second) word a dot at position i (or j respectively).

Result:

  s i t t i n g      k i t t e n
  ^       ^   ^      ^       ^   ^
  . i t t . n .      . i t t . n .

The terminology you are looking for is called "edit distance." Using something like Levenshtein distance will tell you the number of operations needed to transform one string into the other (insertions, deletion, substitutions, etc).

Here is a list of other "editing distance" algorithms.

Once you decide that a word is "close enough" (i.e. it doesn't exceed some threshold of edits needed), you can show where the edits need to occur (by showing dots).

So how do you know where to put the dots?

The interesting thing about the "Levenshtein distance" is it uses a M x N matrix with one word on each axis (see the sample matrix in Levenshtein article). Once you create the matrix, you can determine which letters require "additional edits" to be correct. That's where you put the dots. If the letter requires "no additional edits," you simply print the letter. Pretty cool.

I think you need to do a multi step process not just Levenshtein. First I would check if the input word is a form of target word. That would catch your 3rd example and also no worry about adding dots. You could also use this step to catch synonyms. The next step is to check the length difference of the two strings.

If the difference is 0 you can do a letter for letter comparison to place the dots. If you don't want to show all dots then you should keep a count of dots placed and once over the limit display some error message. (Sorry that was incorrect)

If the difference is shows the input is longer you need to check for a letter to be delete would fix the problem. Here you can use Levenshtein to see how close they are if they are too far away show your error message if it is in range you will need to do the steps of Levenshtein in reverse and mark the changes somehow. Not sure how you want to show that a letter needs deleted.

If the difference shows the input is shorter you can do use Levenshtein distance to see if the two words are close enough or show the error. Then do the steps in reverse again adding dots for insertions and dots for substitutions.

Actually the last two steps can be combined into one function that runs through the algorithm and remembers the insert delete or substitution and changes the output accordingly.

继续阅读：algorithm levenshtein-distance php

Place dots where a word is misspelled

So how do you know where to put the dots?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

So how do you know where to put the dots?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？