C# regexp for nested tags
Let's start with little example; I have the following text:
[[ some tag [[ with tag nested ]] and again ]]
I'd like to match [[ with tag nested ]] but not [[ some tag [[ with tag nested ]] . Simple
\[\[(?<con开发者_如何学Pythontent>.+?)\]\]
obviously didn't work. So I created regexp:
\[\[(?!.*?\[\[.*?\]\].*?)(?<content>.+?)\]\]
Unfortunately it doesn't match anything using C# (with MatchOptions.SingleLine), while PHP's preg_match works perfectly.
Any clues/ideas? Any help would be much appreciated.
The simplest way that I know of to find just one of the innermost brackets is this:
var match = Regex.Match(input, @"^.*(\[\[(.*?)\]\])", RegexOptions.Singleline);
This works because it finds the last [[
(so there are no more [[
after it, so it can’t contain any nested tags) and then the immediately following ]]
. Of course, this assumes well-formedness; if you have a string where the start/end brackets don’t match up properly, this can fail.
Once you’ve found the innermost bracket, you could remove it from the input string:
input = input.Remove(match.Groups[1].Index, match.Groups[1].Length);
and then repeat the process in a while loop until the regular expression no longer matches.
Would this be a valid match?
[[ with [ single ] brackets ]]
If not, this regex should do:
\[\[(?<content>[^][]*)\]\]
[^][]
matches any character that's not [
or ]
. If single braces are allowed, try this:
\[\[(?<content>(?:(?!\[\[|\]\]).)*)\]\]
(?!\[\[|\]\]).
matches any character, but only after making sure it's not the start of a [[
or ]]
sequence.
精彩评论