开发者

sorting csv data according to occurrences of common words

I have a large data coming from a csv file which looks something like below.

url1, comment1
url2, comment2

I need to find the common words between the comments and sort the rows accordingly based on the occurrence of the common words on each row.

At the moment I am able to get the common words but I'm lost as to how to sort the rows per common word without exhausting the memory.

Below is my very inefficient code.

$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) {
  $data[] = $row[1];
}

$str = preg_replace('/\s\s+/', ' ', trim(str_repl开发者_开发百科ace(array('!', '?', '.', ','), ' ', implode('', $data))));

$words = explode(" ", $str);
var_dump(array_count_values($words));


Load the exploded data/words into database sounds like a good idea,

OR you can try this:

$summary = array();
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) 
{
  $data[] = $row[1];
  $str    = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', $row[1])));
  $words  = explode(" ", $str); 
  foreach ($words as $word)
  {
    $word = strtolower($word); // lowercase to reduce variations
    $summary[$word]++;
  }
}
/* variable $summary will contains all your count */
/* take note on the size of $summary, could growth quite big */
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜