What can be wrong with word count program?
I've got a question in my test:
What is wrong with program that counts number of lines and words in file?
open F, $ARGV[0] || die $!;
my @lines = <F>;
my @words = map {split /\s/} @lines;
printf "%8d %8d\n", scalar(@lines), scalar(@words);
close(F);
开发者_Python百科
My conjectures are:
- If file does not exist, program won't tell us about that.
If there are punctuation signs in file, program will count them, for example, in
abc cba , , ,dce
will be five word, but on the other hand
wc
outputs the same result, so it might be considered as correct behavior.- If
F
is a large file, it might be better to iterate over lines and not to dump it intolines
array.
Do you have any less trivial ideas?
On the first line, you have a precedence problem:
open F, $ARGV[0] || die $!;
is the same as
open F, ($ARGV[0] || die $!);
which means the die
is executed if the filename is false, not if the open
fails. You wanted to say
open(F, $ARGV[0]) || die $!;
or
open F, $ARGV[0] or die $!;
Also, you should be using the 3 argument form of open, in case $ARGV[0]
contains characters that mean something to open
.
open F, '<', $ARGV[0] or die $!;
On a different note, splitting on /\s/
means that you get a "word" between consecutive whitespace characters. You probably meant /\s+/
, or as amphetamachine suggested, /\W+/
, depending on how you want to define a "word".
That still leaves the problem of the empty "word" you get if the line begins with whitespace. You could split on ' '
to suppress that (it's a special case), or you could trim leading whitespace first, or insert a grep { length $_ }
to weed out empty "words", or abandon split
and use a different method for counting words.
Processing line by line instead of reading the whole file at once would also be a good improvement, but it's not as important as those first two items.
Your conjecture #1 is incorrect: your program will die if the(see cjm's answer re order of operations.)open
fails.- you're using a global filehandle, rather than a lexical variable.
- you're not using the three-argument form of
open
. - you could just read from stdin, which gives more flexibility as to input - the user can provide a file, or pipe the input into stdin.
- lastly, I wouldn't write my own code to parse words; I'd reach for CPAN, say something like Lingua::EN::Splitter.
use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my ($wordcount, $lines);
while (<>)
{
my $line = $_;
$lines++;
$wordcount += scalar(words $line);
}
printf "%8d %8d\n", $lines, $wordcount;
When you open F, $ARGV[0] || die $!
that will effectively exit if the file doesn't exist.
There are some improvements to be made here:
{local $/; $lines = <F>;} # read all lines at once
my @words = split /\W+/, $lines;
精彩评论