List directory filenames by file word count
I'd like to,
- Check the word count for a folder full of text files.
- Output a list of the file开发者_JS百科s arranged by word count in the format - FILENAME is WORDCOUNT
I know str_word_count is used to get individual wordcounts for files but I'm not sure how to rearrange the output.
Thanks in advance.
Adapted from here.
<?php
$files = array();
$it = new DirectoryIterator("/tmp");
$it->rewind();
while ($it->valid()) {
$count = str_word_count(file_get_contents($it->getFilename()));
$files[sprintf("%010d", $count) . $it->getFilename()] =
array($count, $it->getFilename());
$it->next();
}
ksort($files);
foreach ($files as $tup) {
echo sprintf("%s is %d\n", $tup[1], $tup[0]);
}
EDIT It would be more elegant to have $file
's key be the file name and $file
's value be the word count and then sort by value.
I don't use php but I would
- create array to hold filename and wordcount
- read through the folder full of text files and for each save the filename and wordcount to the array
- sort the array by wordcount
- output the array
To store the information (#2) I would put the information into a 2D array. There is more information about 2D arrays here at Free PHP Tutorial. Thus array[0][0] would equal the name of the first file and array0 would be the wordcount. array1[0] and array1 would be the for the next file.
To sort the array (#3) you can use the tutorial firsttube.com.
The to output I would do a loop through the array and output the first and second location.
for ($i = 0; $i < sizeof($array); ++$i) {
print the filename ($array[$i][0]) and wordcount ($array[$i][1])
}
If you would like to keep the iterator-style approach (yet still do essentially the same as Artefacto's answer) then something like the following would suffice.
$dir_it = new FilesystemIterator("/tmp");
// Build array iterator with word counts
$arr_it = new ArrayIterator();
foreach ($dir_it as $fileinfo) {
// Skip non-files
if ( ! $fileinfo->isFile()) continue;
$fileinfo->word_count = str_word_count(file_get_contents($fileinfo->getPathname()));
$arr_it->append($fileinfo);
}
// Sort by word count descending
$arr_it->uasort(function($a, $b){
return $b->word_count - $a->word_count;
});
// Display sorted files and their word counts
foreach ($arr_it as $fileinfo) {
printf("%10d %s\n", $fileinfo->word_count, $fileinfo->getFilename());
}
Aside: If the files are particularly large (read: loading each one entirely into memory just to count the words is too much) then you could loop over the file line-by-line (or byte-by-byte if you really wanted to) with the SplFileObject
.
精彩评论