PHP Regular expression to capture code
I have been trying to capture code blocks in a similar fashion to wiki tags:
{{code:
code goes here
}}
Ex开发者_JAVA技巧ample code is shown below,
$strings = array('AbCd1zyZ9', 'foo!#$bar');
foreach ($strings as $testcase) {
if (ctype_alnum($testcase)) {
echo "It is The string $testcase consists of all letters or digits.\n";
} else {
echo "The string $testcase does not consist of all letters or digits.\n";
}
}
Essentially I want to capture anything between the {{..}}
. There are multiple blocks like this embedded in an HTML page.
I would appreciate any help.
Well to start off, regex is not a good way to solve this problem. The right approach is to write a parser that understands language semantics and can tease out the subtleties. Having said that, if you still want a quick and dirty regex based approach that will work 99.99% of the time but has a couple of acknowledged bugs (see end of answer), Here you go:
You can use preg_match_all()
. Here is a proof of concept:
$input = "
<html>
<head>
<title>{{code:echo 'Hello World';}}</title>
</head>
<body>
<h1>{{code:\$strings = array('AbCd1zyZ9', 'foo!#$bar');
foreach (\$strings as \$testcase) {
if (ctype_alnum(\$testcase)) {
echo \"It is The string \$testcase consists of all letters or digits.\\n\";
} else {
echo \"The string $testcase does not consist of all letters or digits.\\n\";
}
}
}}</h1>
</body>
</html>
";
$matches = array();
preg_match_all('/{{code:([^\x00]*?)}}/', $input, $matches);
print_r($matches[1]);
Outputs the following:
Array
(
[0] => echo 'Hello World';
[1] => $strings = array('AbCd1zyZ9', 'foo!#');
foreach ($strings as $testcase) {
if (ctype_alnum($testcase)) {
echo "It is The string $testcase consists of all letters or digits.\n";
} else {
echo "The string does not consist of all letters or digits.\n";
}
}
)
Be careful. There are some edge case bugs involving early termination by encountering }}
within a "code" block:
- If
}}
appears in a quoted string, the regex matches too early - If
}
is the last character of your "code" block and it's immediately followed by}}
, you'll lose the closing}
from your code block.
As I've said in the comments, Asaph's answer is a good solid regex, but breaks down when }}
is contained within the code block. Hopefully this won't be a problem, but as there is a possibility of it, it would be best make your regex a little more expansive. If we can assume that any }}
appearing between two single-quotes does not signify the end of the code, as in Asaph's example of <div>{{code:$myvar = '}}';}}</div>
, we can expand our regex a bit:
{{code:((?:[^']*?'[^']*?')*?[^']*?)}}
[^']*?'
looks for a set of non-'
characters, followed by a single quote, and [^']*?'[^']*?'
looks for two of them in succession. This "swallows" strings like '}}'
. We lazily look for any number of these strings, then the rest of any non-string code with [^']*?
, and finally our ending }}
.
This allows us to match the entire string {{code:$myvar = '}}';}}
rather than just {{code:$myvar = '}}
.
There are still problems with this method, however. Escaping a quote within a string, such as in {{code:$myvar = '\'}}\'';}}
will not work, as we will "swallow" '\'
first, and end with the }}
immediately following. It may be possible to determine these escaped single-quotes as well, or to add in support for double-quoted strings, but you need to ask yourself at what point using a code-parser is a better idea.
See the entire Regex in action here. (If it doesn't match anything at first, just click the window.)
how can I use the result to say place it in new ,
<div>
Use the replace function:
preg_replace($expression, "<div>$0</div>", $input)
$0
inserts the entire match, and will place it between a new <div>
block. Alternatively, if you just want the actual source code, use $1
, as we captured the source code in a separate capture group.
Again, see the replacement here.
I went deeper down the rabbit hole...
{{code:((?:(?:[^']|\\')*?(?<!\\)'(?:[^']|\\')*?(?<!\\)')*?(?:[^']|\\')*?)}}
This won't break with escaped single-quotes, and correctly matches {{code:$myvar = '\'}}\'';}}
.
Ta-da.
use
preg_match_all("/{{(.)*}}/", $text, $match)
where text
is the text that might contain code
this captures anything between {{ }}
精彩评论