awk a huge logfile from behind until timestamp
i want to get the last part since a given timestamp "t0" from a possible huge logfile (>50..1000mb):
__________________
|1 xxx xxx ... |
|2 xxx ... | uninteresting part
|4 ... |
|... |
___|423 ... | ___ timestamp t0
|425 xxx ... |
|437 ... |
|... | <-- i want this part ( from t0 to EOF)
|__________________|
and an additional constraint is that i want to do this using simple bash commands. a simple solution may be:
awk '$1 > 423' file.log
but this scans the whole file with all the unintresting lines. there's the command tail but i just can give him the number of last lines i want which i don't know - i jus开发者_JAVA技巧t know the timestamp. is there a way "awking" from behind and stop processing when the first timestamp doesn't match?
tac is your friend here:
tac file.log | awk '{ if ($1 >= 423) print; else exit; }' | tac
tac will dump each line of a file starting with the last line, then working to the beginning of the file. do it once to get the lines you want, then do it again to fix their order.
If I understand right you just need to get n lines from a timestamp regexp to the end of the file.
Lest say your huge file is something like this:
~$ cat > file << EOF
rubish
n lines of rubish
more rubish
timestamp regexp
interesting
n interesting lines
interesting
end of file
EOF
If you are able to get a feasible regexp for the timestamp you are looking for, you can get the part you want with sed:
~$ sed -n '/timestamp regexp/,$ {p}' file
timestamp regexp
interesting
n interesting lines
interesting
end of file
Using standard Unix commands, there isn't much you can do other than scan the entire file. If you write your own program, you could do a binary search on the file:
- seek to a point in the file,
- read forwards to the next start of record,
- check whether the timestamp is too big or too small,
- and iterate until you find the right point in the file.
You might even do a search with linear interpolation rather than a pure binary search if the time stamps are pure numbers; it probably isn't worth the extra coding if the stamps are more complex, but it depends on how often you're going to need this.
Indeed, unless you are going to be doing this a lot and can demonstrate that the performance is a problem, I'd go with the simple awk
solution.
you can poll until you hit "423". Just a hypothetical example (not tested)
n=100 # number of lines you want to go back
while true
do
if tail -$n file | grep -q "423" ;then
tail -$n file | awk '$1>423'
break
else
((n+=100)) # increment every 100 lines
fi
done
精彩评论