How can I parse nested blocks using Regex? [duplicate]
Possible Duplicates:
RegEx match open tags except XHTML self-contained tags .NET Regex balancing groups expression - matching when not balanced
For example, if I had the input:
[quote]He said: [quote]I have no idea![/quote] But I disagree![/quote] And another quote: [quote]Some other quote开发者_如何学Go here.[/quote]
How can I effectively grab blocks of quotes using regular expressions without grabbing too much or too little? For example, if I use:
\[Quote\](.+)\[/Quote\]
This will grab too much (basically, the entire thing), whereas this:
\[Quote\](.+?)\[/Quote\]
will grab too little (it will only grab [quote]He said:[quote]I have no idea![/quote]
, with mismatching start/end braces).
So how can I effectively parse nested blocks of code like this using Regex?
Regexes and nesting do not work well toghether. It's possible (but, depending on the regex dialect you're using, potentially very cumbersome) to construct a regex that matches only an innermost pair. However, if you want to match an entire quote with nested quotes inside, then regular expressions are simply not a strong enough tool. You'll need to look into context-free parser technology, or do successive replaces to rewrite the nested quotes to something else before matching the outer ones.
Take a look at my xml indenter, it uses groups to match beginning tag to the last tag, and another group to get the content recursively.
精彩评论