Processing a log to fix a malformed IP address ?.?.?.x
I would like to replace the first character 'x' with the number '7' on every line of a log file using a shell script. Example of the log file:
216.129.119.x [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.x [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.x [01/Mar/2010:00:27:10 +0100] "GET /etc/....
My humble beginnings...
#!/bin/bash
echo Starting script...
cd /Users/me/logs/
gzip -d /Users/me/logs/access.log.gz
echo Files unzipped...
echo I'm totally lost here to process the log file and save it back to hd...
exit 0
Why is the log file IP malformed like this? My web provider (1and1)开发者_JS百科 has decide not to store IP address, so they have replaced the last number with the character 'x'. They told me it was a new requirement by 'law'. I personally think that is bs, but that would take us off topic.
I want to process these log files with AWstats, so I need an IP address that is not malformed. I want to replace the x with a 7, like so:
216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/....
Not perfect I know, but least I can process the files, and I can still gain a lot of useful information like country, number of visitors, etc. The log files are 200MB each, so I thought that a shell script is the way to go because I can do that rapidly on my Macbook Pro locally. Unfortunately, I know very little about shell scripting, and my javascript skills are not going to cut it this time. I appreciate your help.
since everyone is posting their alternative solutions i'm going to post one that i think is very simple:
sed s/\.x/\.7/ input_file > output_file
replace any string ".x" by ".7"
hope it helps! :)
The following perl one-liner should do the trick:
perl -p -i -e 's/\.x/\.7/' foo.log
It'll substitute the first instance of '.x' with '.7' on each line of the log file.
while i don't know what's the purpose of putting "7" in every IP because that's inaccurate, nevertheless, here's an awk one-liner
$ awk '{sub(/x$/,7,$1)}1' file
216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/....
Python (run with file to process as the first argument):
import sys
import gzip
fin = gzip.GzipFile(sys.argv[1], 'r')
fout = gzip.GzipFile(sys.argv[1] + '.new', 'w', 9)
for line in fin:
address, rest = line.split(' ', 1)
prefix, node = address.rsplit('.', 1)
fout.write('%s.7 %s' % (prefix, rest))
fin.close()
fout.close()
You can use this little python script (which could probably be written in fewer lines than this):
import sys
for line in sys.stdin:
ip_number, rest = line.split(' ', 1)
ip_parts = ip_number.split('.')
ip_parts[3] = '7'
ip_number = '.'.join(ip_parts)
print ip_number, rest,
Save it as fixip.py
and execute it as:
cat access.log | python fixip.py > output.txt
精彩评论