RegEx pattern to match over a single line or more
I'm parsing a log file to identify and retrieve information about failures. Regular Expressions seem to be the right way to go about this.
Here's my initial pattern: \d{4}-\d{2}-\d{2} \d{2}.*
This works for well for single lines like this:
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
This doesn't work for information that spans multiple lines.
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |StackLine:0:0
Here is what a couple of lines in the log look like:
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0
4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StackLine:0:0
5th StackLine:0:0
4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0开发者_Python百科
1st StackLine:0:0
The phrase "StackLine" represents a method signature in the dumped call stack. For example, here two different "StackLine" examples:
ExecuteCodeWithGuaranteedCleanup at offset 0 in file:line:column <filename unknown>:0:0
and
OnXmlMsgReceived at offset 128 in file:line:column d:\buildserver\source\svnroot\DepotManager\trunk\src\DepotManager.Core\Gating\AutoGate\Wherenet\Zla\EventSink.cs:115:17
In an ideal world, I would just get the line, starting at the time stamp through that first line:character notation (which is frequently 0:0).
How would I go about creating a pattern that would match both?
This will match a line starting with a date and all lines following it that do not start with a date.
^\d{4}-\d{2}-\d{2} \d{2}.*$(?:\n(?!\d{4}-\d{2}-\d{2}).*)*
Here is a Rubular example: http://www.rubular.com/r/1BIoLZ5tfs
edit 2: If you want to stop at the first :0:0
you can use the following regex as long as you have a multi-line option enabled so that the .
character will also match newlines:
^\d{4}-\d{2}-\d{2} \d{2}:.*?:\d+:\d+
And here is a new Rubular: http://www.rubular.com/r/rfR1wqDHR8
var log = @"2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0 4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0";
var regex = @"\d{4}-\d{2}-\d{2}\s\d{2}.*?";
var matches = Regex.Matches(log, regex);
var count = matches.Count; // count = 4
Here is a regular expression that matches all your lines:
\d{4}-\d{2}-\d{2} \d{2}[\S\s]*
The reasen your regex didn't work is, because the dot-modifier rarely functions as an "match everything"
PCRE has modifiers and you need PCRE_DOTALL
. You didn't specify a language so I can't give you more than a PHP example: preg_match('/\d{4}-\d{2}-\d{2} \d{2}.*/s'
var rx = new Regex(@"^\d{4}-\d{2}-\d{2} \d{2}[\s\S]*?$^\s*$",
RegexOptions.Multiline);
var matches = rx.Matches(yourText);
Be aware that with \d
you could catch non european digits, but considering that your file is quite "fixed" in format, you shouldn't have any problem (\d
catches all of these:
Unicode Characters in the 'Number, Decimal Digit' Category)
This will work only if there is a blank line at the end of each "log". Even the last log must have a blank line, so the format must be
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine
secondary line of the previous line
(blank)
2011-02-06 02:17:56.9886|FATAL|ClassName|Failure data|StackLine
(blank)
精彩评论