开发者

PHP Regular expression to capture code

I have been trying to capture code blocks in a similar fashion to wiki tags:

{{code:
      code goes here
   }}

Ex开发者_JAVA技巧ample code is shown below,

$strings = array('AbCd1zyZ9', 'foo!#$bar');
foreach ($strings as $testcase) {
    if (ctype_alnum($testcase)) {
        echo "It is The string $testcase consists of all letters or digits.\n";
    } else {
        echo "The string $testcase does not consist of all letters or digits.\n";
    }
}

Essentially I want to capture anything between the {{..}}. There are multiple blocks like this embedded in an HTML page.

I would appreciate any help.


Well to start off, regex is not a good way to solve this problem. The right approach is to write a parser that understands language semantics and can tease out the subtleties. Having said that, if you still want a quick and dirty regex based approach that will work 99.99% of the time but has a couple of acknowledged bugs (see end of answer), Here you go:

You can use preg_match_all(). Here is a proof of concept:

$input = "
<html>
    <head>
        <title>{{code:echo 'Hello World';}}</title>
    </head>
    <body>
        <h1>{{code:\$strings = array('AbCd1zyZ9', 'foo!#$bar');
foreach (\$strings as \$testcase) {
    if (ctype_alnum(\$testcase)) {
        echo \"It is The string \$testcase consists of all letters or digits.\\n\";
    } else {
        echo \"The string $testcase does not consist of all letters or digits.\\n\";
    }
}
}}</h1>
    </body>
</html>
";

$matches = array();
preg_match_all('/{{code:([^\x00]*?)}}/', $input, $matches);

print_r($matches[1]);

Outputs the following:

Array
(
    [0] => echo 'Hello World';
    [1] => $strings = array('AbCd1zyZ9', 'foo!#');
foreach ($strings as $testcase) {
    if (ctype_alnum($testcase)) {
        echo "It is The string $testcase consists of all letters or digits.\n";
    } else {
        echo "The string  does not consist of all letters or digits.\n";
    }
}

)

Be careful. There are some edge case bugs involving early termination by encountering }} within a "code" block:

  1. If }} appears in a quoted string, the regex matches too early
  2. If } is the last character of your "code" block and it's immediately followed by }}, you'll lose the closing } from your code block.


As I've said in the comments, Asaph's answer is a good solid regex, but breaks down when }} is contained within the code block. Hopefully this won't be a problem, but as there is a possibility of it, it would be best make your regex a little more expansive. If we can assume that any }} appearing between two single-quotes does not signify the end of the code, as in Asaph's example of <div>{{code:$myvar = '}}';}}</div>, we can expand our regex a bit:

{{code:((?:[^']*?'[^']*?')*?[^']*?)}}

[^']*?' looks for a set of non-' characters, followed by a single quote, and [^']*?'[^']*?' looks for two of them in succession. This "swallows" strings like '}}'. We lazily look for any number of these strings, then the rest of any non-string code with [^']*?, and finally our ending }}.

This allows us to match the entire string {{code:$myvar = '}}';}} rather than just {{code:$myvar = '}}.

There are still problems with this method, however. Escaping a quote within a string, such as in {{code:$myvar = '\'}}\'';}} will not work, as we will "swallow" '\' first, and end with the }} immediately following. It may be possible to determine these escaped single-quotes as well, or to add in support for double-quoted strings, but you need to ask yourself at what point using a code-parser is a better idea.

See the entire Regex in action here. (If it doesn't match anything at first, just click the window.)


how can I use the result to say place it in new ,<div>

Use the replace function:

preg_replace($expression, "<div>$0</div>", $input)

$0 inserts the entire match, and will place it between a new <div> block. Alternatively, if you just want the actual source code, use $1, as we captured the source code in a separate capture group.

Again, see the replacement here.


I went deeper down the rabbit hole...

{{code:((?:(?:[^']|\\')*?(?<!\\)'(?:[^']|\\')*?(?<!\\)')*?(?:[^']|\\')*?)}}

This won't break with escaped single-quotes, and correctly matches {{code:$myvar = '\'}}\'';}}.

Ta-da.


use

preg_match_all("/{{(.)*}}/", $text, $match)

where text is the text that might contain code this captures anything between {{ }}

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜