开发者

Counting total sum of each value in one column w.r.t another in Perl

I have tab delimited data with multiple columns.

I have OS names in column 31 and data bytes in columns 6 and 7. What I want to do is count the total volume of each unique OS.

So, I did something in Perl like this:

#!/usr/bin/perl
use warnings;

my @hhfilelist  = glob "*.txt";
my %count = ();

for my $f (@hhfilelist) {
    open F, $f || die "Cannot open $f: $!";
    while (<F>) {
        chomp;
        my @line = split /\t/;
        # counting volumes in col 6 and 7 for 31
        $count{$line[30]} = $line[5] + $line[6];     
    }
    close (F);
}

my $w = 0;

foreach $w (sort keys %count) {
    print "$w\t$count{$w}\n";
}

So, the result would be something like

Windows    100000
Linux        5000
Mac OSX     15000
Android      2000

But there seems to be some error in this code because the r开发者_StackOverflow中文版esulting values I get aren't as expected.

What am I doing wrong?


It seems like you're not actually ADDING up the counts - you overwrite the last count for any OS with the count from the last line for that OS.

$count{$line[30]} = $line[5] + $line[6];

Should be

$count{$line[30]} += $line[5] + $line[6];

As additional considerations that can improve your code overall but don't affect the correctness of it:

  1. Please use 3-argument form of open and Lexical filehandles:

     open(my $filehandle, "<", $f) || die "Cannot open $f: $!";
    
  2. If you're 100% sure you file doesn't contain quoted field values or tabs in the field contents, your split based logic is OK. For really complicated X-separated files, I would strongly recommend using Text::CSV_XS/Text::CSV CPAN module

  3. Don't need to initialize %count or $w variables - hash will get autoinitialized to empty hash, and $w gets assigned to as loop variable - you may want to actually declare it in the loop itself: foreach my $w (sort keys %count) {

  4. Please don't use 1-letter variables. $w is meaningless in the last loop, whereas $os_name is clear.


Your expression

open F, $f || die "Cannot open $f: $!";

has a subtle bug in it that will eventually bite you, though probably not today.

The || operator has higher precedence than the comma-operator to its left and so this expression actually gets parsed as

open F, ($f || die "Cannot open $f: $!")

which is to say, you will die when $f has a false (0, "", or undef) value, not when the open statement fails to open the file with the name given by $f.

To do what you mean, you could either use parentheses:

open (F, $f) || die ...

or use the alternate low-precedence or operator

open F, $f  or  die ...

(At times I have been bitten by this myself)


$count{$line[30]} = $line[5] + $line[6];

should use the += operator to add the row's sum to the total, rather than set it as the total:

$count{$line[30]} += $line[5] + $line[6];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜