Regular expression to match anything but 2 consecutive curly braces
What would be the regular expression to match anything but 2 consecutive curly braces ({) ?
An example string:{{some text}} string I want {{another set {{and inner}} }}
I want to get only string i want
.
Using stack to do the stuff had crossed my mind, but I wanted to know if this can be done开发者_开发百科 using regex.
I'm using PHP's PCRE
Thanks in advanceUse a lookahead assertion (?!{{|}})
to verify that you don't have a nested set of braces inside of your outer set.
{{((?!{{|}}).)*}}
Test program
<?php
$string = '{{lot {{of}} characters}}';
for (;;)
{
var_dump($string);
$replacement = preg_replace('/{{((?!{{|}}).)*}}/', '', $string);
if ($string == $replacement)
break;
$string = $replacement;
}
Output
string(25) "{{lot {{of}} characters}}"
string(19) "{{lot characters}}"
string(0) ""
It appears to handle various edge cases reasonably, as well:
# Unbalanced braces.
string(23) "{{lot {{of}} characters"
string(17) "{{lot characters"
string(23) "lot {{of}} characters}}"
string(17) "lot characters}}"
# Multiple sets of braces.
string(25) "{{lot }}of{{ characters}}"
string(2) "of"
# Lone curlies.
string(41) "{{lot {{of {single curly} }} characters}}"
string(19) "{{lot characters}}"
string(0) ""
If you need to do something more complicated with the contents, such as processing the contents or the variables, then you can use a recursive regexp, making use of the (?R) operator.
$data = "{{abcde{{fg{{hi}}jk}}lm}}";
$regexp = "#\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#";
$count = 0;
function revMatch($matches) {
global $regexp, $count;
if (is_array($matches)) {
// Match detected, process for nested components
$subData = preg_replace_callback($regexp, 'revMatch', $matches[1]);
} else {
// No match, leave text alone
$subData = $matches;
}
// This numbers each match, to demonstrate call order
return "(" . $count++ . ":<" . $subData . ">)";
}
echo preg_replace_callback($regexp, 'revMatch', $data);
This converts: {{abcde{{fg{{hi}}jk}}lm}}
to (2:<abcde(1:<fg(0:<hi>)jk>)lm>)
A bit of explanation on the regexp: #\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#
The double braces at the front and back match any target component, the contents of the braces are to be one or more of the two defined options:
a string with no double braces
[^(\{\{)(\}\})]+
the whole regexp repeated. The
(?:)
bracket is a non-capturing group.
NB. The #s
are the pattern delimiters, I thought extra slashes would decrease readability further.
精彩评论