How to fix a BBcode regular expression
I have a regular expression that grabs BBcode tags. It works great except for a minor glitch.
Here is the current expression:
\[([^=\[\]]+)[=\x22']*([^ \[\]]*)['\x22]*\](.+)\[/\1\]
Here is some text it successfully matches against and the groups it builds:
[url=]Go to google![/url]
1: url 2: 3: Go to google![img][/img]
1: img 2: NULL 3:[quote][quote]first nested quote[/quote][quote]second nested quote[/quote][/quote]
1: quote 2: NULL 3: [quote]first nested quote[/quote][quote]second nested quote[/quote]
All of this is great. I can handle nested tags by running the 3rd match group against the same regex and recursively handle all tags that are nested. The problem is with the example using the [quote] tags. Notice that the 3rd match group is a set of two quote tags, so we would expect two matches. However, we get one match, like this:
[quote]first nested quote[/quote][quote]second nested quote[/quote]
1: quote 2: NULL 3: first nested quote[/quote][quote]second nested quote
Ahhhh! That's not what we wanted at all. There is a fairly simple way to fix it, I modify the regex from this:
\[([^=\[\]]+)[=\x22']*([^ \[\]]*)['\x22]*\](.+)\[/\1\]
To this:
\[([^=\[\]]+)[=\x22']*([^ \[\]]*)['\x22]*\](((?!\[/\1\]).)+)\[/\1\]
By adding ((?!\[/\1\]).)
we invalidate the entire match if the 3rd match group contains the closing BBcode tag. So now this works, we get two matches:
[quote]first nested quote[/quote][quote]second nested quote[/quote]
[quote]first nested quote[/quote]
1: quote 2: NULL 3: first nested quote[quote]second nested quote[/quote]
1: quote 2: NULL 3: second nested quote
I was happy that fixed it, but now we have another problem. This n开发者_开发百科ew regex fails on the first one where we nest the two quote tags under one larger quote tag. We get two matches instead of one:
[quote][quote]first nested quote[/quote][quote]second nested quote[/quote][/quote]
[quote][quote]first nested quote[/quote]
1: quote 2: NULL 3: [quote]first nested quote[quote]second nested quote[/quote]
1: quote 2: NULL 3: second nested quote
The first match is all wrong and the second match, while well-formed, is not a desired match. We wanted one big match with the 3rd match group being the two nested quote tags, like when we used the first expression.
Any suggestions? If I can just cross this gap I should have a fairly powerful BBcode expression.
Using balancing groups you can construct a regex like this:
\[ (?<tag>[^][/=\s]+) \s*
(?: = \s* (?<val>[^][]*) \s*)?
Simplified according to Kobi's example.
In the following:
It finds these matches:
Full example at
(Old version