What can be wrong with word count program?

2023-01-21 07:26 问答作者：

I've got a question in my test:

What is wrong with program that counts number of lines and words in file?

open F, $ARGV[0] || die $!;
my @lines = <F>;
my @words = map {split /\s/} @lines;
printf "%8d %8d\n", scalar(@lines), scalar(@words);
close(F); 
开发者_Python百科

My conjectures are:

If file does not exist, program won't tell us about that.
If there are punctuation signs in file, program will count them, for example, in
```
abc cba
, , ,dce
```
will be five word, but on the other hand wc outputs the same result, so it might be considered as correct behavior.
If F is a large file, it might be better to iterate over lines and not to dump it into lines array.

Do you have any less trivial ideas?

On the first line, you have a precedence problem:

open F, $ARGV[0] || die $!;

is the same as

open F, ($ARGV[0] || die $!);

which means the die is executed if the filename is false, not if the open fails. You wanted to say

open(F, $ARGV[0]) || die $!;

open F, $ARGV[0] or die $!;

Also, you should be using the 3 argument form of open, in case $ARGV[0] contains characters that mean something to open.

open F, '<', $ARGV[0] or die $!;

On a different note, splitting on /\s/ means that you get a "word" between consecutive whitespace characters. You probably meant /\s+/, or as amphetamachine suggested, /\W+/, depending on how you want to define a "word".

That still leaves the problem of the empty "word" you get if the line begins with whitespace. You could split on ' ' to suppress that (it's a special case), or you could trim leading whitespace first, or insert a grep { length $_ } to weed out empty "words", or abandon split and use a different method for counting words.

Processing line by line instead of reading the whole file at once would also be a good improvement, but it's not as important as those first two items.

~~Your conjecture #1 is incorrect: your program will die if the open fails.~~ (see cjm's answer re order of operations.)
you're using a global filehandle, rather than a lexical variable.
you're not using the three-argument form of open.
you could just read from stdin, which gives more flexibility as to input - the user can provide a file, or pipe the input into stdin.
lastly, I wouldn't write my own code to parse words; I'd reach for CPAN, say something like Lingua::EN::Splitter.

use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my ($wordcount, $lines);
while (<>)
{
    my $line = $_;
    $lines++;
    $wordcount += scalar(words $line);
}

printf "%8d %8d\n", $lines, $wordcount;

When you open F, $ARGV[0] || die $! that will effectively exit if the file doesn't exist.

There are some improvements to be made here:

{local $/; $lines = <F>;} # read all lines at once

my @words = split /\W+/, $lines;

继续阅读：count perl testing

What can be wrong with word count program?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？