开发者

Processing a log to fix a malformed IP address ?.?.?.x

I would like to replace the first character 'x' with the number '7' on every line of a log file using a shell script. Example of the log file:

216.129.119.x [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.x [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.x [01/Mar/2010:00:27:10 +0100] "GET /etc/....

My humble beginnings...

#!/bin/bash
echo Starting script...
cd /Users/me/logs/
gzip -d /Users/me/logs/access.log.gz
echo Files unzipped...
echo I'm totally lost here to process the log file and save it back to hd...

exit 0

Why is the log file IP malformed like this? My web provider (1and1)开发者_JS百科 has decide not to store IP address, so they have replaced the last number with the character 'x'. They told me it was a new requirement by 'law'. I personally think that is bs, but that would take us off topic.

I want to process these log files with AWstats, so I need an IP address that is not malformed. I want to replace the x with a 7, like so:

216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/....

Not perfect I know, but least I can process the files, and I can still gain a lot of useful information like country, number of visitors, etc. The log files are 200MB each, so I thought that a shell script is the way to go because I can do that rapidly on my Macbook Pro locally. Unfortunately, I know very little about shell scripting, and my javascript skills are not going to cut it this time. I appreciate your help.


since everyone is posting their alternative solutions i'm going to post one that i think is very simple:

sed s/\.x/\.7/ input_file > output_file

replace any string ".x" by ".7"

hope it helps! :)


The following perl one-liner should do the trick:

perl -p -i -e 's/\.x/\.7/' foo.log

It'll substitute the first instance of '.x' with '.7' on each line of the log file.


while i don't know what's the purpose of putting "7" in every IP because that's inaccurate, nevertheless, here's an awk one-liner

$ awk '{sub(/x$/,7,$1)}1' file
216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/....


Python (run with file to process as the first argument):

import sys
import gzip

fin = gzip.GzipFile(sys.argv[1], 'r')
fout = gzip.GzipFile(sys.argv[1] + '.new', 'w', 9)

for line in fin:
  address, rest = line.split(' ', 1)
  prefix, node = address.rsplit('.', 1)
  fout.write('%s.7 %s' % (prefix, rest))

fin.close()
fout.close()


You can use this little python script (which could probably be written in fewer lines than this):

import sys
for line in sys.stdin:
    ip_number, rest = line.split(' ', 1)
    ip_parts = ip_number.split('.')
    ip_parts[3] = '7'
    ip_number = '.'.join(ip_parts)
    print ip_number, rest,

Save it as fixip.py and execute it as:

cat access.log | python fixip.py > output.txt
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜