开发者

Awk parsing of unique IP addresses from maillog

Yesterday I asked a question here about a oneliner and mjschultz gave me an answer that I instantly fell in love with :) Awk开发者_如何学JAVA just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.

This is the one in question:

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

I need the list of all unique IP addresses using pop3 to connect to the mail server.

This is an example log entry:

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.

This is what I have so far for my awk version:

awk '/pop3\[.*.User logged in/ {ip[$7]=0} END {for (address in ip)  
{ print address} }' maillog

This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those entries with awk as well?

As always thanks for all the great responses in advance, you've taught me so much already :)


AWK code

just match your ip format ... be careful that there are no other formats ...

/pop3\[.*.User logged in/    {
         where = match($0,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
         if (where)
           ip[substr($0,RSTART+1,RLENGTH-1)]=0
} 

END {for (address in ip)  
{ print address} }  

running at ideone


That looks more like Perl territory than Awk to me:

my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}

The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.


After seeing and trying these approaches I got a new idea.

belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.

So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:

awk '/pop3\[.*.User logged in/ {{if (NF == 13) $7="";gsub(FS "+",FS)};print $7}'
/var/log/maillog | awk '!($0 in a){a[$0];print}'

Ideone link if you want to see the code in action

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜