开发者

Regular Expression - Repeating Groups

I'm trying to read a log file and extract some machine/setting information using regular expressions. Here is a sample from the log:

...
COMPUTER INFO:
 Computer Name:                 TESTCMP02
 Windows User Name:             testUser99
 Time Since Last Reboot:        405 Minutes
 Processor:                     (2 processors) Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
 OS Version:                    5.1 .number 2600:Service Pack 2
 Memory:                        RAM: 48% used, 3069.6 MB total, 1567.3 MB free
 ServerTimeOffSet:              -146 Seconds 
 Use Local Time for Log:        True

INITIAL SETTINGS:
 Command Line:                  /SKIPUPDATES
 Remote Online:                 True
 INI File:                      c:\demoapp\system\DEMOAPP.INI
 DatabaseName:                  testdb
 SQL Server:                    10.254.58.1
 SQL UserName:                  SQLUser
 ODBC Source:                   TestODBC
 Dynamic ODBC (not defined):    True
...

I would like to ca开发者_如何学Pythonpture each 'block' of data, using the header as one group, and the data as a second (i.e. "COMPUTER INFO", "Computer Name:.......") and repeat this for each block. The expression if have so far is

(?s)(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n)

This pulls out the block into the groups like it should, which is great. But I need to have it repeat the capture, which I can't seem to get. I've tried several grouping expressions, including:

(?s)(?:(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n))*

which would seem to be correct, but I get back lots of NULL result groups with empty group item values. I'm using the .Net RegEx class to apply the expressions, can anyone help me out here?


It's not possible to have repeated groups. The group will contain the last match.

You'll need to break this into two problems. First, find each section:

new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);

And then, within each match, use another regex to match each field/value into groups:

new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);

The code to use this would look something like this:

Regex sectionRegex = new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
Regex nameValueRegex = new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
MatchCollection sections = sectionRegex.Matches(logData);
foreach (Match section in sections)
{
    MatchCollection nameValues = nameValueRegex.Matches(section.ToString());
    foreach (Match nameValue in nameValues)
    {
        string name = nameValue.Groups["name"].Value;
        string value = nameValue.Groups["value"].Value;
        // OK, do something here.
    }
}


((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)+

or, if you have empty lines between items:

(((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)|\r\n)+


Here is how I would go about it. This would allow you to get the value of a specific group easily but the expression would be a bit more complicated. I add line feeds to make it easier to read. Here is the start:

COMPUTER INFO:.*Computer Name:\s*(?<ComputerName>[\w\s]+).*Windows User Name:\s*(?<WindowUserName>[\w\s]+).*Time Since Last Reboot:\s*(?<TimeSinceLastReboot>[\w\s]+).* (?# This continues on through each of the lines... )

with Comiled, IgnoreCase, SingleLine, and CultureInvariant

Then you would be able to match this via the groups ex:

string computerName = match.Group["ComputerName"].Value;
string windowUserName = match.Group["WindowUserName"].Value;
// etc.


Some links regarding repeating groups in regular expressions:

  • http://www.regular-expressions.info/captureall.html
  • http://bytes.com/topic/python/answers/856077-how-get-all-repeated-group-regular-expression
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜