PHP PCRE - correct nested tags behaviour
I want to write symple (consisting of one preg_replace call) forum parser and I run into problems with nested tags.
E.g. if someone is quoting someone quoting someone, I cannot achieve correct behaviour.
When having:
[quote=Tom]
[quote=Jerry]
Lorem
[/quote]
Ipsum
[/quote]
Dolor.
I want something like this:
<blockquote>
<p><strong>Tom wrote</strong></p>
<blockquote>
<p><strong>Jerry wrote:</strong></p>
<p>Lorem</p>
</blockquote>
Ipsum
</blockquote>
Dolor.
I have this code:
preg_replace('~\[quote (.+)\](.+)\[/quote\]~is', '<blockquote><p><strong>$1</strong> wrote:</p&g开发者_如何学Ct;<p>$2</p></blockquote>', $value);
This version is greedy. If I have two separate [quote]
blocks, the regex wraps all the text between the first [quote]
and the second [/quote]
.
If I add the U
modifier, it's too ungreedy - the first [quote]
tag is paired with the first (nested and irrelevant) [/quote]
tag.
Thanks for any help!
There is the PEAR HTML_BBCodeParser Package and also PHP has a native extension for parsing code like this, check this example: http://www.php.net/manual/en/function.bbcode-create.php
Don't use a regular expression for this. Use the official PECL extension provided:
Example (lifted from the docs):
<?php
$arrayBBCode=array(
''=> array('type' => BBCODE_TYPE_ROOT, 'childs' => '!i'),
'i'=> array('type' => BBCODE_TYPE_NOARG, 'open_tag' => '<i>',
'close_tag' => '</i>', 'childs' => 'b'),
'url'=> array('type' => BBCODE_TYPE_OPTARG,
'open_tag' => '<a href="{PARAM}">', 'close_tag' => '</a>',
'default_arg' => '{CONTENT}',
'childs' => 'b,i'),
'img'=> array('type' => BBCODE_TYPE_NOARG,
'open_tag' => '<img src="', 'close_tag' => '" />',
'childs' => ''),
'b'=> array('type'=>BBCODE_TYPE_NOARG, 'open_tag' => '<b>',
'close_tag' => '</b>'),
);
$text = <<<EOF
[b]Bold Text[/b]
[i]Italic Text[/i]
[url]http://www.php.net/[/url]
[url=http://pecl.php.net/][b]Content Text[/b][/url]
[img]http://static.php.net/www.php.net/images/php.gif[/img]
[url=http://www.php.net/]
[img]http://static.php.net/www.php.net/images/php.gif[/img]
[/url]
EOF;
$BBHandler = bbcode_create($arrayBBCode);
echo bbcode_parse($BBHandler, $text);
?>
The full docs.
With some help of recursive regular expressions:
function replace_quotes_callback($matches) {
$cite = empty($matches[1]) ? '' : '<p><strong>' . $matches[1] . '</strong> wrote:</p>';
return '<blockquote>' . $cite . '<p>' . replace_quotes($matches[2]) . '</p></blockquote>';
}
function replace_quotes($data) {
return preg_replace_callback('~\[quote(?:=([^\]]+))?\]((?:(?R)|.)*?)\[/quote\]~s', 'replace_quotes_callback', $data);
}
The pattern matches only outermost quote blocks, and the callback function replace_quotes_callback
replace quotes inside itself by recursively call replace_quotes
.
精彩评论