Regular Expression - Repeating Groups
I'm trying to read a log file and extract some machine/setting information using regular expressions. Here is a sample from the log:
...
COMPUTER INFO:
Computer Name: TESTCMP02
Windows User Name: testUser99
Time Since Last Reboot: 405 Minutes
Processor: (2 processors) Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
OS Version: 5.1 .number 2600:Service Pack 2
Memory: RAM: 48% used, 3069.6 MB total, 1567.3 MB free
ServerTimeOffSet: -146 Seconds
Use Local Time for Log: True
INITIAL SETTINGS:
Command Line: /SKIPUPDATES
Remote Online: True
INI File: c:\demoapp\system\DEMOAPP.INI
DatabaseName: testdb
SQL Server: 10.254.58.1
SQL UserName: SQLUser
ODBC Source: TestODBC
Dynamic ODBC (not defined): True
...
I would like to ca开发者_如何学Pythonpture each 'block' of data, using the header as one group, and the data as a second (i.e. "COMPUTER INFO", "Computer Name:.......") and repeat this for each block. The expression if have so far is
(?s)(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n)
This pulls out the block into the groups like it should, which is great. But I need to have it repeat the capture, which I can't seem to get. I've tried several grouping expressions, including:
(?s)(?:(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n))*
which would seem to be correct, but I get back lots of NULL result groups with empty group item values. I'm using the .Net RegEx class to apply the expressions, can anyone help me out here?
It's not possible to have repeated groups. The group will contain the last match.
You'll need to break this into two problems. First, find each section:
new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
And then, within each match, use another regex to match each field/value into groups:
new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
The code to use this would look something like this:
Regex sectionRegex = new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
Regex nameValueRegex = new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
MatchCollection sections = sectionRegex.Matches(logData);
foreach (Match section in sections)
{
MatchCollection nameValues = nameValueRegex.Matches(section.ToString());
foreach (Match nameValue in nameValues)
{
string name = nameValue.Groups["name"].Value;
string value = nameValue.Groups["value"].Value;
// OK, do something here.
}
}
((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)+
or, if you have empty lines between items:
(((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)|\r\n)+
Here is how I would go about it. This would allow you to get the value of a specific group easily but the expression would be a bit more complicated. I add line feeds to make it easier to read. Here is the start:
COMPUTER INFO:.*Computer Name:\s*(?<ComputerName>[\w\s]+).*Windows User Name:\s*(?<WindowUserName>[\w\s]+).*Time Since Last Reboot:\s*(?<TimeSinceLastReboot>[\w\s]+).* (?# This continues on through each of the lines... )
with Comiled, IgnoreCase, SingleLine, and CultureInvariant
Then you would be able to match this via the groups ex:
string computerName = match.Group["ComputerName"].Value;
string windowUserName = match.Group["WindowUserName"].Value;
// etc.
Some links regarding repeating groups in regular expressions:
- http://www.regular-expressions.info/captureall.html
- http://bytes.com/topic/python/answers/856077-how-get-all-repeated-group-regular-expression
精彩评论