filter log file by defining regexes
I have some HUGE log files (50Mb; ~500K lines) I need to start filtering some of the crap out of. The log files are being produced using log4j and have the basic pattern of:
[log-level] date-time class etc, etc
log-message
I'm looking for a way that I can identify a regex start and regex end (or something similar) that will filter out the matching entries from the file so I can more easily wade through these massive files. My thoughts are that the start regex would be the log-level and the end regex would be something in the log-message. I'm sure I could write a java program to accomplish this task, but I thought I'd ask the community before going down that path. Thanks in advance.
Let me expand on my question. Let's assume I have the following snippet in my log file:
[DEBUG] date-time class etc, etc
log-message-1
[WARN] date-time class etc, etc
log-message-2
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-1
[WARN] date-time class etc, etc
log-message-2
[DEBUG] date-time class etc, etc
log-message-6
I'd like a way to filter out logEntry1 and logEntry2 so I end up with:
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-6
I would hope to accomplish this be defining some sets of rege开发者_如何学Cx patterns pairs. In my example above, I'd want to define a pair for logEntry1 and another for logEntry2.
I hope that helps clarify my question.
Assuming log-message-1
and log-message-2
and unique patterns.
$ awk -vRS= '!/log-message-[12]/' ORS="\n\n" file
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-6
(zyx:~) % echo $T
[DEBUG] date-time class etc, etc
log-message-1
[WARN] date-time class etc, etc
log-message-2
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-1
[WARN] date-time class etc, etc
log-message-2
[DEBUG] date-time class etc, etc
log-message-6
(zyx:~) % echo $T | perl -e '$_=join("", <>); s/\[DEBUG\][^\n]*\n(log-message-1|log-message-2).*?(?=\n\[(DEBUG|WARN)\]|$)//sg; s/\[WARN\].*?(?=\n\[(DEBUG|WARN)\]|$)//sg; print;'
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-6
Use awk
or awk-styled perl one-liners.
精彩评论