开发者

filter log file by defining regexes

I have some HUGE log files (50Mb; ~500K lines) I need to start filtering some of the crap out of. The log files are being produced using log4j and have the basic pattern of:

[log-level] date-time class etc, etc  
log-message  

I'm looking for a way that I can identify a regex start and regex end (or something similar) that will filter out the matching entries from the file so I can more easily wade through these massive files. My thoughts are that the start regex would be the log-level and the end regex would be something in the log-message. I'm sure I could write a java program to accomplish this task, but I thought I'd ask the community before going down that path. Thanks in advance.


Let me expand on my question. Let's assume I have the following snippet in my log file:

[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-3

[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-6

I'd like a way to filter out logEntry1 and logEntry2 so I end up with:

[DEBUG] date-time class etc, etc  
log-message-3

[DEBUG] date-time class etc, etc  
log-message-6

I would hope to accomplish this be defining some sets of rege开发者_如何学Cx patterns pairs. In my example above, I'd want to define a pair for logEntry1 and another for logEntry2.

I hope that helps clarify my question.


Assuming log-message-1 and log-message-2 and unique patterns.

$ awk -vRS= '!/log-message-[12]/' ORS="\n\n" file
[DEBUG] date-time class etc, etc
log-message-3

[DEBUG] date-time class etc, etc
log-message-6


(zyx:~) % echo $T
[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-3

[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-6
(zyx:~) % echo $T | perl -e '$_=join("", <>); s/\[DEBUG\][^\n]*\n(log-message-1|log-message-2).*?(?=\n\[(DEBUG|WARN)\]|$)//sg; s/\[WARN\].*?(?=\n\[(DEBUG|WARN)\]|$)//sg; print;'


[DEBUG] date-time class etc, etc  
log-message-3



[DEBUG] date-time class etc, etc  
log-message-6


Use awk or awk-styled perl one-liners.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜