sorting csv data according to occurrences of common words
I have a large data coming from a csv file which looks something like below.
url1, comment1
url2, comment2
I need to find the common words between the comments and sort the rows accordingly based on the occurrence of the common words on each row.
At the moment I am able to get the common words but I'm lost as to how to sort the rows per common word without exhausting the memory.
Below is my very inefficient code.
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) {
$data[] = $row[1];
}
$str = preg_replace('/\s\s+/', ' ', trim(str_repl开发者_开发百科ace(array('!', '?', '.', ','), ' ', implode('', $data))));
$words = explode(" ", $str);
var_dump(array_count_values($words));
Load the exploded data/words into database sounds like a good idea,
OR you can try this:
$summary = array();
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false)
{
$data[] = $row[1];
$str = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', $row[1])));
$words = explode(" ", $str);
foreach ($words as $word)
{
$word = strtolower($word); // lowercase to reduce variations
$summary[$word]++;
}
}
/* variable $summary will contains all your count */
/* take note on the size of $summary, could growth quite big */
精彩评论