开发者

Pimp my Perl code

I'm an experienced developer, but not in Perl. I usually learn Perl to hack a script, then I forget it again until the next time. Hence I'm looking for advice from the pros.

This time around I'm building a series of data analysis scripts. Grossly simplified, the program structure is like this:

01 my $config_var = 999;

03 my $result_var = 0;

05 foreach my $file (@files) {
06   open(my $fh, $file);
07   while (<$fh>) {
08     &analyzeLine($_);
09   }
10 }

12 print "$result_var\n";

14 sub analyzeLine ($) {
15   my $line = shift(@_);
16   $result_var = $result_var + calculatedStuff;
17 }

In real life, there are up to about half a dozen different config_vars and result_vars.

These scripts differ mostly in the values assigned to the config_vars. The main loop will be the same in every case, and analyzeLine() will be mostly the same but could have some small variations.

I can accomplish my purpose by making N copies of this code, with small changes here and there; but that grossly violates all kinds of rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by

do theCommonStuff;

Note that config_var (and its siblings) must be available to the common code, as must result_var and its lookalikes, upon which analyzeLine() does some calculations.

Should I pack my "common" code into a module? Create a class? Use global variables?

While not exactly code golf, I'm looking for a simple, compact solution that will allow me to DRY and write code only for the differences. I think I would rather not drive the code off a huge table containing all the configs, and certainly not adapt it to use a database.

Looking forward to your suggestions, and thanks!


Update

Since people asked, here's the real analyzeLine:

# Update stats with t开发者_开发百科ime and call data in one line.
sub processLine ($) {
  my $line = shift(@_);
  return unless $line =~ m/$log_match/;
  # print "$1 $2\n";
  my ($minute, $function) = ($1, $2);
  $startMinute = $minute if not $startMinute;
  $endMinute = $minute;
  if ($minute eq $currentMinute) {
    $minuteCount = $minuteCount + 1;
  } else {
    if ($minuteCount > $topMinuteCount) {
      $topMinute = $currentMinute;
      $topMinuteCount = $minuteCount;
      printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount);
    }
    $totalMinutes = $totalMinutes + 1;
    $totalCount = $totalCount + $minuteCount;
    $currentMinute = $minute;
    $minuteCount = 1;
  }
}

Since these variables are largely interdependent, I think a functional solution with separate calculations won't be practical. I apologize for misleading people.


Two comments: First, don't post line numbers as they make it more difficult than necessary to copy, paste and edit. Second, don't use &func() to invoke a sub. See perldoc perlsub:

A subroutine may be called using an explicit & prefix. The & is optional in modern Perl, ... Not only does the & form make the argument list optional, it also disables any prototype checking on arguments you do provide.

In short, using & can be surprising unless you know what you are doing and why you are doing it.

Also, don't use prototypes in Perl. They are not the same as prototypes in other languages and, again, can have very surprising effects unless you know what you are doing.

Do not forget to check the return value of system calls such as open. Use autodie with modern perls.

For your specific problem, collect all configuration variables in a hash. Pass that hash to analyzeLine.

#!/usr/bin/perl

use warnings; use strict;
use autodie;

my %config = (
    frobnicate => 'yes',
    machinate  => 'no',
);

my $result;
$result += analyze_file(\%config, $_) for @ARGV;

print "Result = $result\n";

sub analyze_file {
    my ($config, $file) = @_;

    my $result;

    open my $fh, '<', $file;
    while ( my $line = <$fh> ) {
        $result += analyze_line($config, $line);
    }

    close $fh;

    return $result;
}

sub analyze_line {
    my ($line) = @_;
    return length $line;
}

Of course, you will note that $config is being passed all over the place, which means you might want to turn this in to a OO solution:

#!/usr/bin/perl

package My::Analyzer;

use strict; use warnings;

use base 'Class::Accessor::Faster';

__PACKAGE__->follow_best_practice;
__PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) );

sub analyze_file {
    my $self = shift;
    my ($file) = @_;

    my $result;

    open my $fh, '<', $file;
    while ( my $line = <$fh> ) {
        $result += $self->analyze_line($line);
    }

    close $fh;

    return $result;
}

sub analyze_line {
    my $self = shift;
    my ($line) = @_;
    return $self->get_analyzer->($line);
}

package main;

use warnings; use strict;
use autodie;

my $x = My::Analyzer->new;

$x->set_analyzer(sub {
        my $length; $length += length $_ for @_; return $length;
});
$x->set_frobnicate('yes');
$x->set_machinate('no');


my $result;
$result += $x->analyze_file($_) for @ARGV;

print "Result = $result\n";


Go ahead and create a class hierarchy. Your task is an ideal playground for OOP style of programming. Here's an example:

package Common;
sub new{
  my $class=shift;
  my $this=bless{},$class;
  $this->init();
  return $this;
}
sub init{}
sub theCommonStuff(){ 
  my $this=shift;
  for(1..10){ $this->analyzeLine($_); }
}
sub analyzeLine(){
  my($this,$line)=@_;
  $this->{'result'}.=$line;
}

package Special1;
our @ISA=qw/Common/;
sub init{
  my $this=shift;
  $this->{'sep'}=',';   # special param: separator
}
sub analyzeLine(){      # modified logic
  my($this,$line)=@_;
  $this->{'result'}.=$line.$this->{'sep'};
}

package main;
my $c = new Common;
my $s = new Special1;
$c->theCommonStuff;
$s->theCommonStuff;
print $c->{'result'}."\n";
print $s->{'result'}."\n";


If all the common code is in one function, a function taking your config variables as parameters, and returning the result variables (either as return values, or as in/out parameters), will do. Otherwise, making a class ("package") is a good idea, too.

sub common_func {
    my ($config, $result) = @_;
    # ...
    $result->{foo} += do_stuff($config->{bar});
    # ...
}

Note in the above that both the config and result are hashes (actually, references thereto). You can use any other data structure that you feel will suit your goal.


Some thoughts:

  • If there are several $result_vars, I would recommend creating a separate subroutine for calculating each one.
  • If a subroutine relies on information outside that function, it should be passed in as a parameter to that subroutine, rather than relying on global state.
  • Alternatively wrap the whole thing in a class, with $result_var as an attribute of the class.

Practically speaking, there are a couple ways you could implement this:

(1) Have your &analyzeLine function return calculatedStuff, and add it to &result_var in a loop outside the function:

  $result_var = 0;
  foreach my $file (@files) {
      open(my $fh, $file);
          while (<$fh>) {
              $result_var += analyzeLine($_);
          }
      }
  }

  sub analyzeLine ($) {
      my $line = shift(@_);
      return calculatedStuff;
  }

(2) Pass $result_var into analyzeLine explicitly, and return the changed $result_var.

  $result_var = 0;
  foreach my $file (@files) {
      open(my $fh, $file);
          while (<$fh>) {
              $result_var = addLineToResult($result_var, $_);
          }
      }
  }

  sub addLineToResult ($$) {
      my $running_total = shift(@_);
      my $line = shift(@_);
      return $running_total + calculatedStuff;
  }

The important part is that if you separate out functions for each of your several $result_vars, you'll be more readily able to write clean code. Don't worry about optimizing yet. That can come later, when your code has proven itself slow. The improved design will make optimization easier when the time comes.


why not create a function and using $config_var and $result_var as parameters?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜