开发者

What can be wrong with word count program?

I've got a question in my test:

What is wrong with program that counts number of lines and words in file?

open F, $ARGV[0] || die $!;
my @lines = <F>;
my @words = map {split /\s/} @lines;
printf "%8d %8d\n", scalar(@lines), scalar(@words);
close(F); 
开发者_Python百科

My conjectures are:

  1. If file does not exist, program won't tell us about that.
  2. If there are punctuation signs in file, program will count them, for example, in

    abc cba
    , , ,dce
    

    will be five word, but on the other hand wc outputs the same result, so it might be considered as correct behavior.

  3. If F is a large file, it might be better to iterate over lines and not to dump it into lines array.

Do you have any less trivial ideas?


On the first line, you have a precedence problem:

open F, $ARGV[0] || die $!;

is the same as

open F, ($ARGV[0] || die $!);

which means the die is executed if the filename is false, not if the open fails. You wanted to say

open(F, $ARGV[0]) || die $!;

or

open F, $ARGV[0] or die $!;

Also, you should be using the 3 argument form of open, in case $ARGV[0] contains characters that mean something to open.

open F, '<', $ARGV[0] or die $!;

On a different note, splitting on /\s/ means that you get a "word" between consecutive whitespace characters. You probably meant /\s+/, or as amphetamachine suggested, /\W+/, depending on how you want to define a "word".

That still leaves the problem of the empty "word" you get if the line begins with whitespace. You could split on ' ' to suppress that (it's a special case), or you could trim leading whitespace first, or insert a grep { length $_ } to weed out empty "words", or abandon split and use a different method for counting words.

Processing line by line instead of reading the whole file at once would also be a good improvement, but it's not as important as those first two items.


  • Your conjecture #1 is incorrect: your program will die if the open fails. (see cjm's answer re order of operations.)
  • you're using a global filehandle, rather than a lexical variable.
  • you're not using the three-argument form of open.
  • you could just read from stdin, which gives more flexibility as to input - the user can provide a file, or pipe the input into stdin.
  • lastly, I wouldn't write my own code to parse words; I'd reach for CPAN, say something like Lingua::EN::Splitter.
use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my ($wordcount, $lines);
while (<>)
{
    my $line = $_;
    $lines++;
    $wordcount += scalar(words $line);
}

printf "%8d %8d\n", $lines, $wordcount;


When you open F, $ARGV[0] || die $! that will effectively exit if the file doesn't exist.

There are some improvements to be made here:

{local $/; $lines = <F>;} # read all lines at once

my @words = split /\W+/, $lines;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜