awk a huge logfile from behind until timestamp

2023-03-02 01:14 问答作者：

i want to get the last part since a given timestamp "t0" from a possible huge logfile (>50..1000mb):

     __________________
    |1 xxx xxx ...     |
    |2 xxx ...         |     uninteresting part
    |4 ...             |
    |...               |
 ___|423 ...           | ___ timestamp t0
    |425 xxx ...       |
    |437 ...           |
    |...               |     <-- i want this part ( from t0 to EOF)
    |__________________|

and an additional constraint is that i want to do this using simple bash commands. a simple solution may be:

awk '$1 > 423' file.log

but this scans the whole file with all the unintresting lines. there's the command tail but i just can give him the number of last lines i want which i don't know - i jus开发者_JAVA技巧t know the timestamp. is there a way "awking" from behind and stop processing when the first timestamp doesn't match?

tac is your friend here:

tac file.log | awk '{ if ($1 >= 423) print; else exit; }' | tac

tac will dump each line of a file starting with the last line, then working to the beginning of the file. do it once to get the lines you want, then do it again to fix their order.

If I understand right you just need to get n lines from a timestamp regexp to the end of the file.

Lest say your huge file is something like this:

~$ cat > file << EOF
rubish
n lines of rubish
more rubish
timestamp regexp
interesting
n interesting lines
interesting
end of file
EOF

If you are able to get a feasible regexp for the timestamp you are looking for, you can get the part you want with sed:

~$ sed -n '/timestamp regexp/,$ {p}' file
timestamp regexp
interesting
n interesting lines
interesting
end of file

Using standard Unix commands, there isn't much you can do other than scan the entire file. If you write your own program, you could do a binary search on the file:

seek to a point in the file,
read forwards to the next start of record,
check whether the timestamp is too big or too small,
and iterate until you find the right point in the file.

You might even do a search with linear interpolation rather than a pure binary search if the time stamps are pure numbers; it probably isn't worth the extra coding if the stamps are more complex, but it depends on how often you're going to need this.

Indeed, unless you are going to be doing this a lot and can demonstrate that the performance is a problem, I'd go with the simple awk solution.

you can poll until you hit "423". Just a hypothetical example (not tested)

n=100 # number of lines you want to go back
while true
do
  if tail -$n file | grep -q "423" ;then
     tail -$n file | awk '$1>423'
     break
  else
     ((n+=100)) # increment every 100 lines
  fi
done

继续阅读：bash logfiles text

awk a huge logfile from behind until timestamp

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？