How to compare 2 strings with int out of errors?

2023-02-18 11:56 问答作者：

I've searched online for a diff algorithm but none of them do what I am looking for. It is for a texting contest (as in cell phone) and I need the entry text compared to the master text recording the errors along the way. I am semi-new to C# and I get most of the string functions and didn't think this was going to be that hard of a problem, but alas I just can't wrap my head around it.

I have a form with 2 rich-text-boxes (one on top of the other) and 2 buttons. The top box is the master text (string) and the bottom box is the entry text (string). Every contestant is sending a text to an email account, from the email we copy and paste the text into the Entry RTB and compare to the Master RTB. For each single word an开发者_如何学God single space counts as a thing to check. A word, no matter how many errors it has, is still 1 error. And for every error add 1 sec. to their time.

Examples:

Hello there! <= 3 checks (2 words and 1 space)
- Helothere! <= 2 errors (Helo and space)
- Hello there!! <= 1 error (extra ! at end of there!)
Hello there! How are you? <= 9 checks (5 words and 4 spaces)
- Helothere!! How a re you? <= still 9 checks, 4 errors(helo, no space, extra !, and a space in are)
- Hello there!@ Ho are yu?? <= 3 errors (@ at end of there!, no w, no o and extra ? (all errors are still under the 1 word)

What I have so far: I've created 6 arrays (3 for master, 3 for entry) and they are

CharArray of all chars
StringArray of all strings(words) including the spaces
IntArray with length of the string in each StringArray

My biggest trouble is if the entry text is wrong and it's shorter or longer than the master. I keep getting IndexOutOfRange exceptions (understandably) but can't fathom how to go about checking and writing the code to compensate. I hope I have made myself clear enough as to what I need help with. If anyone could give some code examples or something to shoot me in the right path would be very helpful.

Have you looked into the Levenshtein distance algorithm? It returns the number of differences between two strings, which, in your case would be texting errors. Implementing the algorithm based off the pseudo-code found on the wikipedia page passes the first 3 of your 4 use cases:

Assert.AreEqual(2, LevenshteinDistance("Hello there!", "Helothere!");
Assert.AreEqual(1, LevenshteinDistance("Hello there!", "Hello there!!"));

Assert.AreEqual(4, LevenshteinDistance("Hello there! How are you?", "Helothere!! How a re you?"));
Assert.AreEqual(3, LevenshteinDistance("Hello there! How are you?", "Hello there!@ Ho are yu??"));  //fails, returns 4 errors

So while not perfect out of the box, it is probably a good starting point for you. Also, if you have too much trouble implementing your scoring rules, it might be worth revisiting them.

hth

Update:

Here is the result of the string you requested in the comments:

Assert.AreEqual(7, LevenshteinDistance("Hello there! How are you?", "Hlothere!! Hw a reYou?");  //fails, returns 8 errors

And here is my implementation of the Levenshtein Distance algorithm:

int LevenshteinDistance(string left, string right)
{
    if (left == null || right == null)
    {
        return -1;
    }

    if (left.Length == 0)
    {
        return right.Length;
    }

    if (right.Length == 0)
    {
        return left.Length;
    }

    int[,] distance = new int[left.Length + 1, right.Length + 1];

    for (int i = 0; i <= left.Length; i++)
    {
        distance[i, 0] = i;
    }

    for (int j = 0; j <= right.Length; j++)
    {
        distance[0, j] = j;
    }

    for (int i = 1; i <= left.Length; i++)
    {
        for (int j = 1; j <= right.Length; j++)
        {
            if (right[j - 1] == left[i - 1])
            {
                distance[i, j] = distance[i - 1, j - 1];
            }
            else
            {
                distance[i, j] = Min(distance[i - 1, j] + 1,      //deletion
                                     distance[i, j - 1] + 1,      //insertion
                                     distance[i - 1, j - 1] + 1); //substitution
            }
        }
    }

    return distance[left.Length, right.Length];
}

int Min(int val1, int val2, int val3)
{
    return Math.Min(val1, Math.Min(val2, val3));
}

You need to come up with a scoring systems that works for you're situation.

I would make a word array after each space.

If a word is found on the same index +5.
If a word is found on the same index +-1 index location +3 (keep a counter how much words differ to increase the +- correction
If a needed word is found as part of another word +2

etc.etc. Matching words is hard, getting up with a rules engine that works is 'easier'

I once implemented an algorithm (which I can't find at the moment, I'll post code when I find it) which looked at the total number of PAIRS in the target string. i.e. "Hello, World!" would have 11 pairs, { "He", "el", "ll",...,"ld", "d!" }.

You then do the same thing on an input string such as "Helo World" so you have { "He",...,"ld" }.

You can then calculate accuracy as a function of correct pairs (i.e. input pairs that are in the list of target pairs), incorrect pairs (i.e. input pairs that do not exists in the list of target pairs), compared to the total list of target pairs. Over long enough sentences, this measure would be very ~~accurate~~ fair.

A simple algorithm would be to check letter by letter. If the letters differ increment the num of errors. If the next pairing of letters match, its a switched letter so just continue. If the messup matches the next letter, it is an omission and treat it accordingly. If the next letter matches the messed up one, its an insertion and treat it accordingly. Else the person really messed up and continue.

This doesn't get everything but with a few modifications this could become comprehensive.

a weak attempt at pseudocode: edit: new idea. look at comments. I don't know the string functions off the top of my head so you'll have to figure that part out. The algorithm kinda fails for words that repeat a lot though...

string entry; //we'll pretend that this has stuff inside
string master; // this too...
string tempentry = entry; //stuff will be deleted so I need a copy to mess up
int e =0; //word index for entry
int m = 0; //word index for master
int errors = 0;
while(there are words in tempentry) //!tempentry.empty() ?
  string mword = the next word in master;
  m++;
  int eplace = find mword in tempentry; //eplace is the index of where the mword starts in tempentry
  if(eplace == -1) //word not there...
    continue;
  else
     errors += m - e;
     errors += find number of spaces before eplace
     e = m // there is an error
     tempentry = stripoff everything between the beginning and the next word// substring?
all words and spaces left in master are considered errors.

There are a couple of bounds checking errors that need to be fixed here but its a good start.

继续阅读：diff string

How to compare 2 strings with int out of errors?

Update:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Update:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？