开发者

removing leading zeros from IP addresses: converting ipfilter.dat to bluetack.co.uk ipfilter with sed

I had a need to convert uTorrent-style ipfilter.dat into a bluetack-style ipfilter file, and wrote this shell script to achieve this:

#!/bin/bash

# read ipfilter.dat-formatted file line by line
# (example: 000.000.000.000-008.008.003.255,000,Badnet
# - ***here, input file's lines/fields are always the same length***)
# and convert into a bluetack.co.uk-formatted output
# (example: Badnet:0.0.0.0-8.8.3.255
# - fields moved around, leading zeros removed)

while read record
do
start=`echo ${record:0:15} | awk -F '.' '{for(i=1;i<=NF;i++)$i=$i+0;}1' OFS='.'`
end=`echo ${record:16:15} | awk -F '.' '{for(i=1;i<=NF;i++)$i=$i+0;}1' OFS='.'`
echo ${record:36:7}:${start}-${end}
done < $1

However, on a 2000-line input file this script takes on average 10(!) seconds to complete - a mere 200 lines/sec.

I'm sure this same result can be achieved with sed, and sed-version is likely to be much faster.

Is there a sed-guru around to suggest a solution for this kind of fixed-positions replacements?

Feel free to suggest a solution in other languages as well - I would enjoy testing a Python or a C version, for exampl开发者_StackOverflowe. A more efficient shell/bash version would be welcome as well.


You could try this.

sed -r 's/^0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)-0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+)\.0*([0-9]+),...,(.*)$/\9:\1.\2.\3.\4-\5.\6.\7.\8/' inputfile

I didn't test the performance but I guess it could be faster than 200 lines/sec.


You will be sacrificing performance using the shell's while read loop on a big file. It is empirically proven that tools such as awk/sed (and some languages eg Perl/Python/Ruby) are better at iterating big files and processing the lines than the shell's while read loop. Moreover, in your script, while iterating over the lines, you are also piping a few calls to awk. This is extra overheads.

Ruby(1.9+)

$ cat file
000.000.000.000-008.008.003.255,000,Badnet
001.010.110.111-002.020.220.222,111,Badnet

$ ruby -F"," -ane 'puts "#{$F[-1].chomp}:" + $F[0].gsub(/(00|0)([0-9]+)([.-])/,"\\2\\3")'   file
Badnet:0.0.0.0-8.8.3.255
Badnet:1.10.110.111-2.20.220.222


I really wanted to get this to work in a single sed command, but I wasn't able to figure it out. Surely this will still be faster than 200 lines/s though.

sed 's/\.0\{1,2\}/\./g' | sed 's/^0\{1,2\}//'


#!/bin/tclsh

#Regsub TCL script to remove the leading zeros from the ip address.

#Author : Shoeb Masood , Bangalore

puts "Enter the ip address"
set ip [gets stdin]
set list_ip [split $ip .]
foreach index $list_ip {
regsub  {^0|^00} $index {\1} index
lappend list_ip2 $index
}
set list_ip2 [join $list_ip2 "."]
puts $list_ip2
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜