Perl Help...Read file..do math...write file? New at Perl
I have a file that is as such:
DATA_SET1
INFO1 INFO2 INFO3 = ### ##开发者_Go百科# ###
INFO4 = ###
INFO5 = ###
INFO6 = ###
INFO7 = ###
INFO8 = ###
DATA_SET2
INFO1 INFO2 INFO3 = ### ### ###
INFO4 = ###
INFO5 = ###
INFO6 = ###
INFO7 = ###
INFO8 = ###
etc...
I need to do some statistics of the numbers. EX: the average of the INFO4 number from DATASET1, DATASET2, etc... Then I have to write the average to another file:
STATISTICS:
INFO4 Average: ###
I am VERY VERY new to PERL. This is probably very easy to do, I just have no idea where to start.
Thank you for you help!
First you need to get the file into a data structure. Something like this should work if the formatting you have is always fairly the same.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper; # for printing the hash of results at the end
my $file = $ARGV[0]; # Specify the file as first command line argument
open my $fh, '<', $file;
my %data;
my $current_set = 'default';
while(my $line = <$fh>) {
chomp $line;
if ($line =~ /(DATA_SET\d+)/) {
$current_set = $1;
} elsif ($line =~ /=/) {
my ($vars, $vals) = split(/\s*=\s*/, $line);
my @vars = split(/\s/,$vars);
my @vals = split(/\s/,$vals);
die "length of variable declarations is not equal to length value declarations"
unless (@vars == @vals);
while (@vars) {
my $var = shift @vars;
my $val = shift @vals;
$data{$current_set}{$var} = $val;
}
}
}
print Dumper \%data;
#This assumes that each DATA_SET has an INFO4 term.
# N.B. It will assume a zero if not defined!
my @INFO4 = map { $data{$_}{'INFO4'} } keys %data;
die "Nothing to average" unless @INFO4;
my $sum;
foreach (@INFO4) {
$sum += $_;
}
my $av = $sum / scalar @INFO4;
print "$av\n";
At the end I just print the created data structure, you will need to do your homework here to use this data structure (EDIT: Added averaging over INFO4 terms). perldoc
is a good place to start. Also if you need some high powered math, I would look at the Perl Data Language (PDL) which implements a fast array math (Matlab like) numerical language in Perl.
Good luck.
If you're just calculating the averages of INFO 4 values, you could probably just use regex to identify and split those values. Here's a rough script sample that you might be able to use to get started (these aren't necessarily best practices, but I tried to make it clear what is going on). It reads a data file and adds to the average when it's an Info 4 value. (I used substring assuming that there's no errors in the data input, but again, this is just meant to be a rough answer that works for your sample case). You may also need to consider sprintf to round off your values as needed (it currently will act as a float). Hope this helps out.
open (IN, "file.txt") or die "Unable to open input file.\n";
while ($line = ){
chomp($line);
if ($line =~ m/INFO4/i){
$average += int(substr($line, 8,length $line));
$count++;
}
}
close (IN);
$average = ($average/$count) if $count > 0;
open (OUT, ">output.txt") or die "Unable to open output file.\n";
print OUT "INF04 Average: $average\n";
close(OUT);
I'd start here Learning Perl or here Programming Perl and here Perl Cookbook if you know more than one langiage
精彩评论