开发者

PHP - Which word is properly typed?

I'm looking for help on writing a script to check a list of phrases/words and compare them to one another and see which one is the properly typed phrase/word.

$arr1 = array('fbook', 'yahoo msngr', 'text me later', 'how r u');  
$arr2 = array('facebook', 'yahoo messenger', 'txt me l8r', 'how are you');

So, in comparing each index in each array, it should go through each array and compa开发者_如何学编程re both values. In the end, it should produce:

facebook
yahoo messenger
text me later
how are you

Any help, I appreciate it!


There's no way to "guess" which is the correct way, you must have a knowledge base (i.e.: a dictionary).

This dictionary can be implemented using pspell (aspell) as @Dominic mentioned, or you can have your own array as a dictionary.

If you have an array as dictionary, you can use the Levenshtein algorithm, that is available as a function in php to calculate the distance between two words (i.e.: your word and the reference one). So you can iterate over the reference array to find the word(s) that have the smallest difference from the one you're looking for, and those might be the best options to suggest as a correction. If the distance is 0, so the word that is being checked is already correct.


If your input is fairly simple and you have pspell installed, and the arrays are the same size:

For each index in the two arrays you could explode the string on spaces, pspell_check each word, and the phrase with the highest percentage of words for which pspell_check returned true would be the phrase to keep.

Sample code to get you started:

function percentage_of_good_words($phrase) {
  $words = explode(" ", $phrase);
  $num_good = 0;
  $num_total = count($words);

  if ($num_total == 0) return 0;

  for ($words as $word) {
    if (pspell_check($word)) {
      $num_good++;
    }
  }

  return ($num_good / $num_total) * 100;
}

$length = count($arr1);
$kept = array();
for ($i = 0; i < $length; $i++) {
   $percent_from_arr1 = percentage_of_good_words($arr1[$i]);
   $percent_from_arr2 = percentage_of_good_words($arr2[$i]);
   $kept[$i] = $percent_from_arr1 > $percent_from_arr2 ? $arr1[$i] : $arr2[$i];
}


You need to define some rules while processing these words. By your example, you need a regex and you want the keyword that has a longer length, but there might be cases longer length might not work.


If you had an array you know is correct it would be very easy to do something like:

foreach ($correct_array as $word => $num){
    if ($word == $tested_array[$num])
        {echo "this is correct: " . $word . "<br />";}
    else{
        echo "this is incorrectly spelled: " . $tested_array[$num] . "<br />";
    }

}


if all you need to do is make sure it's properly spelled, you can use in_array, like this:

foreach ($arr2 as $val){
   if(in_array($val,$arr1){
     //spelled properly
   }
   else{
     //spelled incorrectly
   }

}

if you want to actually autocorrect them, it would probably take a pretty complicated algorithim, and storing every possible misspelling in a database somewhere.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜