开发者

Writing a regex to capture text between outer parenthesis

So I'm trying to a parse a file that has text in this format:

outerkey = (innerkey = innervalue)

It gets more complex. This is also legal in the file:

outerkey = (innerkey = (twodeepkey = twodeepvalue)(twodeepkey2 = twodeepvalue2))

So I want to basically capture only the outerkey's text. I cannot guarantee that all of the text will be on one line. It is possible that the value be on multiple lines. And there is more than one item in the file.

So here's my regex so far:

[^\s=]+\s*=\s*(\(\s*.*\s*\))

The goal is for me to simply replace the first part [^\s=]+ with the key I want to search on and I get the entire text of the outer parenthesis.

Here's the problem. My regex will not only capture the text I want to capture, but it will a开发者_Python百科lso capture the text from the next group since regex's are greedy. Making it not greedy would not work either since it will stop capturing at the first closing parenthesis.

Ultimately, if I have the following string

foo = 
(
  ifoo = ifoov
)

bar =
(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

I want the groups to match

(
  ifoo = ifoov
)

and

(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

Right now it will match

(
  ifoo = ifoov
)

bar =
(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

By the way, I am running this in multiline and singleline mode.

Any ideas? Thanks!


I was able to adapt the balancing group definition .NET regex feature for this problem as follows:

Regex r = new Regex(@"(?x) # for sanity!

    (?'Key' [^=\s]* )
    \s*=\s*
    (?'Value'
      (
         (
           [^()]*
           (?'Open'\()
         )+
         (
           [^()]*
           (?'Close-Open'\))
         )+
      )+?
    )
    (?(Open)(?!))

");

We can then test it as follows:

var text = @"
foo = 
(
  ifoo = ifoov
)

bar =
(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)

outerkey = (innerkey = (twodeepkey = twodeepvalue)(twodeepkey2 = twodeepvalue2))
";

foreach (Match m in r.Matches(text)) {
  Console.WriteLine("Key: [{0}]", m.Groups["Key"]);
  Console.WriteLine("Value: [{0}]", m.Groups["Value"]);
  Console.WriteLine("-------");
}
Console.WriteLine("That's all folks!");

This prints (as seen on ideone.com):

Key: [foo]
Value: [(
  ifoo = ifoov
)]
-------
Key: [bar]
Value: [(
  ibar =
    (iibar = iibarv)
    (iibar2 = iibarv2)
)]
-------
Key: [outerkey]
Value: [(innerkey = (twodeepkey = twodeepvalue)(twodeepkey2 = twodeepvalue2))]
-------
That's all folks!

Some minor modifications from the example pattern from the documentation are:

  • The open - close - neither brackets are now \( - \) - [^()] instead of < - > - [^<>]
  • The balanced structure is repeated with +? (at least one, but as few as possible) instead of *
  • "content" is matched before, not after the parentheses


Generally speaking, regexp cannot count matches, so this not easy to accomplish. .NET, however, has a feature called 'balancing group definitions' The example here shows how to match paired angle brackets and should get you there...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜