开发者

Regex matching on to extract multi-line text regions (C#)

I'm looking to capture text regions in a large text block, created in the following format:

...
[region:region-name]
mult开发者_StackOverflowi line
text block
[/region]
...
[region:another-region-name]
more
multi-line text
[/region]

I have this almost worked out with

\[region:(?'link'.*)\](?'text'(.|[\r\n])*)\[/region\]

This works if I only had one region in the entire text. But, when there are multiple, this gives me just one block with every other 'region' included in the 'text' of that one. I have a feeling that this is to be solved using a negative look ahead, but being a non-pro with regex, I don't know how to modify the above to do it right. Can someone help?


You can do this without lookahead:

\[region:(?'link'.*)\](?'text'(?s).*?)\[/region\]

The additional ? makes the * quantifier lazy, so it will match as few characters as possible. And the (?s) allows the dot to match newlines after this position, so you don't have to use the (.|[\r\n]) construction (an alternative would be [\s\S]).


You don't need a negative lookahead, just need to change (?'text'(.|[\r\n])*) to be "non-greedy", so that it will match the first instance of [/region] rather than the last. You can do this by adding ? after *, so the resulting pattern would be:

\[region:(?'link'.*)\](?'text'(.|[\r\n])*?)\[/region\]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜