How do I print a computed score for all input in a file?
Here is some Perl code which takes two files as input. The files contain TCP packets. It trains itself for the normal packets using the packets in first file and then prints the anomalous packets in the second file.
while (<>) {
if (($time, $to, $port, $from, $duration, $flags, $length, $text) = /(.{19}) (.{15}):(\d+) (.{15}):\d+ \+(\d+) (\S+) (\d+) (.*)/) {
$text =~ s/\^M//g;
$text =~ s/\^ /\n/g;
if (($port == 25 || $port == 80) && $text =~ /\n\n/) {$text = "$`\n";}
$text =~开发者_开发百科 s/^\^@//;
if ($time =~ /(\d\d)\/(\d\d)\/\d\d\d\d (\d\d):(\d\d):(\d\d)/) {
$now = ((($1 * 31 + $2) * 24 + $3) * 60 + $4) * 60 + $5;
}
foreach ($text =~ /.*\n/g) {
if (($k, $v) = /(\S*)(.*)/) {
$k = substr($k, 0, 30);
$v = substr($v, 0, 100);
$score = 0;
$comment = "";
&alarm($port, $k);
&alarm($to, $flags);
&alarm("To", "$to:$port");
&alarm($to, $from);
&alarm("$to:$port", $from);
if ($score > 30000) {
$score = log($score) / (10 * log(10));
printf(" # 0 $time $to %8.6f \#%s\n", $score, substr($comment, 0, 300));
}
}
}
}
}
sub alarm {
local ($key, $val, $sc) = @_;
if ($now < 10300000) {
++$n{$key};
if (++$v{$key . $val} == 1) {
++$r{$key};
$t{$key} = $now;
}
} elsif ($n{$key} > 0 && !$v{$key . $val}) {
$score += ($now - $t{$key}) * $n{$key} / $r{$key};
$comment .= " $key=$val";
$t{$key} = $now;
}
}
exit;
I am new to Perl and as a small part my project it needs that anomaly score is to be printed for all the packets in the second file. Can anybody tell how to modify the code?
From what I can see here, it looks as if the code (as it is now) looks for packets before some cutoff time, and stores whether or not it has seen certain conditions in the %n
and %v
hashes.
Why not give an extra flag to your alarm
function called $training
. If true, just account for the packet values, otherwise, calculate a score for this anomaly (if it is one), and return that value. If there is no anomaly, or if you're in training mode, just return zero:
sub alarm {
my ($key, $val, $training) = @_;
my $score = 0;
if ( $training ) {
...do your accounting...
} else {
...do your comparisons & set score accordingly...
}
return $score;
}
Throw your big while
into a subroutine, and have that subroutine take a filename and whether it is in training mode or not.
sub examine {
my ($file, $training) = @_;
if ( open my $fh, '<', $file ) {
while (<$fh>) {
...this is your big while loop...
...pass $training along to your alarm() calls...
}
} else {
die "Failed to open $file: $!\n';
}
}
Your main program is now:
use constant TRAINING => 1;
examine('file1', TRAINING);
examine('file2', !TRAINING);
More notes:
- Use
my()
instead oflocal
, though it doesn't materially affect this program, it's a good habit to get into. - Don't use a well known function name
alarm
when it really isn't doing anything of the kind, instead name it something likecheck_packet_values
-- or something that makes sense to you and your team. Stop using magic numbers
use constant { CUTOFF_TIME => 10300000, ANOMALY_SCORE => 30000 };
Use a real date/time parser so that your values have some meaning.
str2time
fromDate::Parse
would give you your time in epoch seconds (seconds since Jan 1, 1970).- Use variable names that mean something.
%n
and%v
are hard to understand in this code, but%n_seen
and%value_seen
(as well as%first_seen_time
instead of%t
). Remember, your code doesn't run faster if you use shorter variable names. Stop using global variables when feasible. The counters can be global, but your comment should be built only in the routine which is initializing and printing the comment. So, instead of doing what you are doing, how about:
$to_score = check_packet_value($to, $flags) and push @comments, "$to=$flags"; ... $score = $to_score + $from_score + ... if ( !$training && $score > ANOMALY_THRESHOLD ) { print "blah blah blah @comments\n"; }
Also, never, ever use $` -- it causes huge performance penalties in your entire script (even if it never calls this function). Instead of:
if ( $text =~ /\n\n/ ) { $text = $` }
Use
if ( $text =~ /(.*)\n\n/ ) {
$text = $1;
}
(Edit: added warning about $`)
I may have misunderstood your question and comment, so forgive me if this isn't what you're asking...
Your printf function currently resides inside this if ($score > 30000)
check, so you'll only get the output if the $score
is > 30000.
if ($score>30000) {
$score=log($score)/(10*log(10));
printf(" # 0 $time $to %8.6f \#%s\n", $score, substr($comment, 0, 300));
}
If you want to print the output regardless of the $score
, you just need to move the printf line outside this if check.
精彩评论