Awk command for combining lines and summarizing them
This is the format that I have.
Source IP Destination IP Received Sent
192.168.0.1 10.10.10.1 3412 341
192.168.0.1 10.10.10.1 341 43
192.168.0.1 10.22.22.2 34 334
192.168.0.1 192.168.9.3 34 243
开发者_如何学JAVA
But a very large file of these. I basically want to give the total bandwidth of each source IP. So I need to combine all uniq source IPs and then add the received columns of everything that is unique and then add the sent columns. The end outcome would be:
source ip - total received packets - total sent packets
It would also be nice to uniq the source and destination IP as well so I could also get
source ip - destination ip - total received packets - total sent packets
Any help would be greatly appreciated
just looking at the Source IP:
awk '
NR == 1 {next}
{
recv[$1] += $3
sent[$1] += $4
}
END {for (ip in recv) printf("%s - %d - %d\n", ip, recv[ip], sent[ip]}
' filename
for source/destination pairs, just a slight modification:
awk '
NR == 1 {next}
{
key = $1 " - " $2
recv[key] += $3
sent[key] += $4
}
END {for (key in recv) printf("%s - %d - %d\n", key, recv[key], sent[key])}
' filename
Ruby(1.9+)
#!/usr/bin/env ruby
hash_recv=Hash.new(0)
hash_sent=Hash.new(0)
hash_src_dst_recv=Hash.new(0)
hash_src_dst_sent=Hash.new(0)
f=File.open("file")
f.readline
f.each do |line|
s = line.split
hash_recv[s[0]] += s[2].to_i
hash_sent[s[0]] += s[-1].to_i
hash_src_dst_recv[ s[0,2] ] += s[2].to_i
hash_src_dst_sent[ s[0,2] ] += s[-1].to_i
end
f.close
p hash_recv
p hash_sent
p hash_src_dst_recv
p hash_src_dst_sent
test run:
$ ruby test.rb
{"192.168.0.1"=>3787, "192.168.168.0.1"=>34}
{"192.168.0.1"=>718, "192.168.168.0.1"=>243}
{["192.168.0.1", "10.10.10.1"]=>3753, ["192.168.0.1", "10.22.22.2"]=>34, ["192.168.168.0.1", "192.168.9.3"]=>34}
{["192.168.0.1", "10.10.10.1"]=>384, ["192.168.0.1", "10.22.22.2"]=>334, ["192.168.168.0.1", "192.168.9.3"]=>243}
I would do a (a little bit formatted but you could write it in one line):
sort file.txt | awk ' BEGIN {start = 1;}
{
ip = $1;
if (lastip == ip) {
sum_r += $3; sum_s += $4;
}
else
{ if (!start) print lastip ": " sum_r ", " sum_s
else
start = 0;
lastip = ip; sum_r = $3; sum_s = $4;
}
}
END { print lastip ": " sum_r ", " sum_s }'
awk '{
if (NR==FNR){
Recieved[$1,$2]+=$3;Sent[$1,$2]+=$4;
}else{
if(Recieved[$1,$2]){
print $1" " $2" " Recieved[$1,$2]" "Sent[$1,$2];Recieved[$1,$2]=""
}
}
}' InputFile.txt InputFile.txt
InputFile is read twice hence it is added two times at the end. The first occurence of inputfile (which is used in if(NR==FNR) condition) is to build the two arrays and second inputfile (used in else condition) is to print all the combinations and also setting the array value to blank so that we wont print again.
Glenn's Solution below is much superior it reads the file only once
精彩评论