开发者

PHP - return the words that show up the most inside a string

$string = 'I like banana, banana souffle, chocobanana and marshmellows.";
$arr = some_function($string);开发者_C百科 
// $arr = ('banana'=>3,'I'=>1,'like'=>1....);

do you have an idea how to do this most efficiently?


$str = 'I like banana, banana souffle, chocobanana and marshmellows.';
$words = str_word_count($str, 1);
$freq = array();
foreach ($words as $w) {
  if (preg_match_all('/' . preg_quote($w, '/') . '/', $str, $m)) {
    $freq[$w] = count($m[0]);
  }
}
print_r($freq);


you can use array_count_values

eg

$string = 'I like banana, banana souffle, chocobanana and marshmellows';
$s = preg_split("/[, ]+/",$string);
print_r(array_count_values($s));

note: this only count whole words. ie "banana" will be 2 , not 3 because chocobanana is not the same as banana. If you want to search and count for words within words, extra coding is necessary


preg_match_all('!\b\w+\b!', $string, $matches);
$arr = array_count_values($matches[0]);
print_r($arr);


Because you want to count partial words, you will need a wordlist with possible words. Then you split up the text in words based on space separation at first, loop through all words and try to find the longest possible substring match against the wordlist. This will of course be really, really slow if the wordlist is big, but maybe you can speed up the matching by using a suffix array of the word you are searching through.

If you don't find a matching substring, just count the whole word as one.

I hope you understand my idea. It's not that great, but it's the solution I can come up with for your requirements.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜