开发者

C# regexp for nested tags

Let's start with little example; I have the following text:

[[ some tag [[ with tag nested ]] and again ]]

I'd like to match [[ with tag nested ]] but not [[ some tag [[ with tag nested ]] . Simple

\[\[(?<con开发者_如何学Pythontent>.+?)\]\]

obviously didn't work. So I created regexp:

\[\[(?!.*?\[\[.*?\]\].*?)(?<content>.+?)\]\]

Unfortunately it doesn't match anything using C# (with MatchOptions.SingleLine), while PHP's preg_match works perfectly.

Any clues/ideas? Any help would be much appreciated.


The simplest way that I know of to find just one of the innermost brackets is this:

var match = Regex.Match(input, @"^.*(\[\[(.*?)\]\])", RegexOptions.Singleline);

This works because it finds the last [[ (so there are no more [[ after it, so it can’t contain any nested tags) and then the immediately following ]]. Of course, this assumes well-formedness; if you have a string where the start/end brackets don’t match up properly, this can fail.

Once you’ve found the innermost bracket, you could remove it from the input string:

input = input.Remove(match.Groups[1].Index, match.Groups[1].Length);

and then repeat the process in a while loop until the regular expression no longer matches.


Would this be a valid match?

[[ with [ single ] brackets ]]

If not, this regex should do:

 \[\[(?<content>[^][]*)\]\]

[^][] matches any character that's not [ or ]. If single braces are allowed, try this:

\[\[(?<content>(?:(?!\[\[|\]\]).)*)\]\]

(?!\[\[|\]\]). matches any character, but only after making sure it's not the start of a [[ or ]] sequence.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜