Awk parsing of unique IP addresses from maillog

2023-01-25 17:20 问答作者：

Yesterday I asked a question here about a oneliner and mjschultz gave me an answer that I instantly fell in love with :) Awk开发者_如何学JAVA just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.

This is the one in question:

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

I need the list of all unique IP addresses using pop3 to connect to the mail server.

This is an example log entry:

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.

This is what I have so far for my awk version:

awk '/pop3\[.*.User logged in/ {ip[$7]=0} END {for (address in ip)  
{ print address} }' maillog

This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those entries with awk as well?

As always thanks for all the great responses in advance, you've taught me so much already :)

AWK code

just match your ip format ... be careful that there are no other formats ...

/pop3\[.*.User logged in/    {
         where = match($0,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
         if (where)
           ip[substr($0,RSTART+1,RLENGTH-1)]=0
} 

END {for (address in ip)  
{ print address} }

running at ideone

That looks more like Perl territory than Awk to me:

my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}

The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.

After seeing and trying these approaches I got a new idea.

belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.

So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:

awk '/pop3\[.*.User logged in/ {{if (NF == 13) $7="";gsub(FS "+",FS)};print $7}'
/var/log/maillog | awk '!($0 in a){a[$0];print}'

Ideone link if you want to see the code in action

继续阅读：bash regex

Awk parsing of unique IP addresses from maillog

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？