开发者

How do I print a computed score for all input in a file?

Here is some Perl code which takes two files as input. The files contain TCP packets. It trains itself for the normal packets using the packets in first file and then prints the anomalous packets in the second file.

while (<>) {
    if (($time, $to, $port, $from, $duration, $flags, $length, $text) = /(.{19}) (.{15}):(\d+) (.{15}):\d+ \+(\d+) (\S+) (\d+) (.*)/) {
        $text =~ s/\^M//g;
        $text =~ s/\^ /\n/g;
        if (($port == 25 || $port == 80) && $text =~ /\n\n/) {$text = "$`\n";}
        $text =~开发者_开发百科 s/^\^@//;
        if ($time =~ /(\d\d)\/(\d\d)\/\d\d\d\d (\d\d):(\d\d):(\d\d)/) {
            $now = ((($1 * 31 + $2) * 24 + $3) * 60 + $4) * 60 + $5;
        }
        foreach ($text =~ /.*\n/g) {
            if (($k, $v) = /(\S*)(.*)/) {
                $k = substr($k, 0, 30);
                $v = substr($v, 0, 100);
                $score   = 0;
                $comment = "";
                &alarm($port,       $k);
                &alarm($to,         $flags);
                &alarm("To",        "$to:$port");
                &alarm($to,         $from);
                &alarm("$to:$port", $from);
                if ($score > 30000) {
                    $score = log($score) / (10 * log(10));
                    printf("    #   0 $time $to %8.6f \#%s\n", $score, substr($comment, 0, 300));
                }
            }
        }
    }
}

sub alarm {
    local ($key, $val, $sc) = @_;
    if ($now < 10300000) {
        ++$n{$key};
        if (++$v{$key . $val} == 1) {
            ++$r{$key};
            $t{$key} = $now;
        }
    } elsif ($n{$key} > 0 && !$v{$key . $val}) {
        $score += ($now - $t{$key}) * $n{$key} / $r{$key};
        $comment .= " $key=$val";
        $t{$key} = $now;
    }
}

exit;

I am new to Perl and as a small part my project it needs that anomaly score is to be printed for all the packets in the second file. Can anybody tell how to modify the code?


From what I can see here, it looks as if the code (as it is now) looks for packets before some cutoff time, and stores whether or not it has seen certain conditions in the %n and %v hashes.

Why not give an extra flag to your alarm function called $training. If true, just account for the packet values, otherwise, calculate a score for this anomaly (if it is one), and return that value. If there is no anomaly, or if you're in training mode, just return zero:

    sub alarm {
         my ($key, $val, $training) = @_;
         my $score = 0;
         if ( $training ) {
             ...do your accounting...
         } else {
             ...do your comparisons & set score accordingly...
         }
         return $score;
     }

Throw your big while into a subroutine, and have that subroutine take a filename and whether it is in training mode or not.

     sub examine {
         my ($file, $training) = @_;
         if ( open my $fh, '<', $file ) {
             while (<$fh>) {
                 ...this is your big while loop...
                 ...pass $training along to your alarm() calls...
             }
         } else {
             die "Failed to open $file: $!\n';
         }
     }

Your main program is now:

     use constant TRAINING => 1;

     examine('file1',  TRAINING);
     examine('file2', !TRAINING);

More notes:

  • Use my() instead of local, though it doesn't materially affect this program, it's a good habit to get into.
  • Don't use a well known function name alarm when it really isn't doing anything of the kind, instead name it something like check_packet_values -- or something that makes sense to you and your team.
  • Stop using magic numbers

    use constant {
        CUTOFF_TIME   => 10300000,
        ANOMALY_SCORE =>    30000
    };
    
  • Use a real date/time parser so that your values have some meaning. str2time from Date::Parse would give you your time in epoch seconds (seconds since Jan 1, 1970).

  • Use variable names that mean something. %n and %v are hard to understand in this code, but %n_seen and %value_seen (as well as %first_seen_time instead of %t). Remember, your code doesn't run faster if you use shorter variable names.
  • Stop using global variables when feasible. The counters can be global, but your comment should be built only in the routine which is initializing and printing the comment. So, instead of doing what you are doing, how about:

    $to_score = check_packet_value($to, $flags)
        and push @comments, "$to=$flags";
    ...
    $score = $to_score + $from_score + ...
    if ( !$training && $score > ANOMALY_THRESHOLD ) {
        print "blah blah blah @comments\n";
    }
    
  • Also, never, ever use $` -- it causes huge performance penalties in your entire script (even if it never calls this function). Instead of:

    if ( $text =~ /\n\n/ ) { $text = $` }
    

Use

    if ( $text =~ /(.*)\n\n/ ) {
        $text = $1;
    }

(Edit: added warning about $`)


I may have misunderstood your question and comment, so forgive me if this isn't what you're asking...

Your printf function currently resides inside this if ($score > 30000) check, so you'll only get the output if the $score is > 30000.

if ($score>30000) {
    $score=log($score)/(10*log(10));
    printf("    #   0 $time $to %8.6f \#%s\n", $score, substr($comment, 0, 300));
}

If you want to print the output regardless of the $score, you just need to move the printf line outside this if check.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜