开发者

Regular expression to match anything but 2 consecutive curly braces

What would be the regular expression to match anything but 2 consecutive curly braces ({) ?

An example string:

{{some text}} string I want {{another set {{and inner}} }}

I want to get only string i want.

Using stack to do the stuff had crossed my mind, but I wanted to know if this can be done开发者_开发百科 using regex.

I'm using PHP's PCRE

Thanks in advance


Use a lookahead assertion (?!{{|}}) to verify that you don't have a nested set of braces inside of your outer set.

{{((?!{{|}}).)*}}

Test program

<?php
$string = '{{lot {{of}} characters}}';

for (;;)
{
    var_dump($string);
    $replacement = preg_replace('/{{((?!{{|}}).)*}}/', '', $string);

    if ($string == $replacement)
        break;

    $string = $replacement;
}

Output

string(25) "{{lot {{of}} characters}}"
string(19) "{{lot  characters}}"
string(0) ""

It appears to handle various edge cases reasonably, as well:

# Unbalanced braces.
string(23) "{{lot {{of}} characters"
string(17) "{{lot  characters"

string(23) "lot {{of}} characters}}"
string(17) "lot  characters}}"

# Multiple sets of braces.
string(25) "{{lot }}of{{ characters}}"
string(2) "of"

# Lone curlies.
string(41) "{{lot {{of {single curly} }} characters}}"
string(19) "{{lot  characters}}"
string(0) ""


If you need to do something more complicated with the contents, such as processing the contents or the variables, then you can use a recursive regexp, making use of the (?R) operator.

$data = "{{abcde{{fg{{hi}}jk}}lm}}";
$regexp = "#\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#";
$count = 0;

function revMatch($matches) {
  global $regexp, $count;

  if (is_array($matches)) {
    // Match detected, process for nested components
    $subData = preg_replace_callback($regexp, 'revMatch', $matches[1]);
  } else {
    // No match, leave text alone
    $subData = $matches;
  }

  // This numbers each match, to demonstrate call order
  return "(" . $count++ . ":<" . $subData . ">)";
}

echo preg_replace_callback($regexp, 'revMatch', $data);

This converts: {{abcde{{fg{{hi}}jk}}lm}} to (2:<abcde(1:<fg(0:<hi>)jk>)lm>)


A bit of explanation on the regexp: #\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#

The double braces at the front and back match any target component, the contents of the braces are to be one or more of the two defined options:

  1. a string with no double braces [^(\{\{)(\}\})]+

  2. the whole regexp repeated. The (?:) bracket is a non-capturing group.

NB. The #s are the pattern delimiters, I thought extra slashes would decrease readability further.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜