开发者

RegEx pattern to match over a single line or more

I'm parsing a log file to identify and retrieve information about failures. Regular Expressions seem to be the right way to go about this.

Here's my initial pattern: \d{4}-\d{2}-\d{2} \d{2}.*

This works for well for single lines like this:

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0

This doesn't work for information that spans multiple lines.

2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |StackLine:0:0

Here is what a couple of lines in the log look like:

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0 

 2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |7th StackLine:0:0  
6th StackLine:0:0  
5th StackLine:0:0  
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0开发者_Python百科  
1st StackLine:0:0

The phrase "StackLine" represents a method signature in the dumped call stack. For example, here two different "StackLine" examples:

ExecuteCodeWithGuaranteedCleanup at offset 0 in file:line:column <filename unknown>:0:0  

and

OnXmlMsgReceived at offset 128 in file:line:column d:\buildserver\source\svnroot\DepotManager\trunk\src\DepotManager.Core\Gating\AutoGate\Wherenet\Zla\EventSink.cs:115:17

In an ideal world, I would just get the line, starting at the time stamp through that first line:character notation (which is frequently 0:0).

How would I go about creating a pattern that would match both?


This will match a line starting with a date and all lines following it that do not start with a date.

^\d{4}-\d{2}-\d{2} \d{2}.*$(?:\n(?!\d{4}-\d{2}-\d{2}).*)*

Here is a Rubular example: http://www.rubular.com/r/1BIoLZ5tfs

edit 2: If you want to stop at the first :0:0 you can use the following regex as long as you have a multi-line option enabled so that the . character will also match newlines:

^\d{4}-\d{2}-\d{2} \d{2}:.*?:\d+:\d+

And here is a new Rubular: http://www.rubular.com/r/rfR1wqDHR8


var log = @"2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0 4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0";
var regex = @"\d{4}-\d{2}-\d{2}\s\d{2}.*?";
var matches = Regex.Matches(log, regex);
var count = matches.Count; // count = 4


Here is a regular expression that matches all your lines:
\d{4}-\d{2}-\d{2} \d{2}[\S\s]*

The reasen your regex didn't work is, because the dot-modifier rarely functions as an "match everything"


PCRE has modifiers and you need PCRE_DOTALL. You didn't specify a language so I can't give you more than a PHP example: preg_match('/\d{4}-\d{2}-\d{2} \d{2}.*/s'


var rx = new Regex(@"^\d{4}-\d{2}-\d{2} \d{2}[\s\S]*?$^\s*$", 
                   RegexOptions.Multiline);

var matches = rx.Matches(yourText);

Be aware that with \d you could catch non european digits, but considering that your file is quite "fixed" in format, you shouldn't have any problem (\d catches all of these: Unicode Characters in the 'Number, Decimal Digit' Category)

This will work only if there is a blank line at the end of each "log". Even the last log must have a blank line, so the format must be

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine
secondary line of the previous line
(blank)
2011-02-06 02:17:56.9886|FATAL|ClassName|Failure data|StackLine
(blank)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜