Regex & BBCode - Perfecting Nested Quote

2022-12-31 20:34 问答作者：

I'm working on some BBcode for my website.

I've managed to get most of the codes working perfectly, however the [QUOTE] tag is giving me some grief.

When I get something like this:

[QUOTE=1]
[QUOTE=2]
This is a quote from someone else
[/QUOTE]
This is someone else quoting someone else
[/QUOTE]

It will return:

> 1 said:  [QUOTE=2]This is a开发者_StackOverflow中文版 quote from
> someone else

This is someone else quoting someone else[/QUOTE]

So what is happening is the [/quote] from the nested quote is closing the quote block.

The Regex I am using is:

"[quote=(.*?)\](.*?)\[/quote\]'is"

How can I make it so nested Quotes will appear properly?

Thank you.

You could construct recursive regular expression (available since libpcre-3.0 according to their changelog):

\[quote=(.*?)\](((?R)|.)*?)\[\/quote\]

But it would be better if you follow @codeka advice.

Update: (?R) here means «insert the whole regular expression in place where (?R) occurs». So a(?R)?b is equivalent (if you forget about capturing groups) to a(a(?-1)?b)?b which is equivalent to a(a(a(?-1)?b)?b)?b and so on. Instead of (?R) you can use (?N), (?+N), (?-N) and (?&a) which means «substitute with N'th capturing group», «substitute with N'th next capturing group», «substitute with N'th previous capturing group» and «substitute with capturing group named «a»».

This is not really a task that regular expressions are good for. It's almost like trying to parse HTML with regular expressions, and we know what happens when you do that...

What you could do, and even then I don't think it's all that great an idea, is to use preg_split to split your input text into tags-and-non-tags. So you'll end up with a list like this:

[QUOTE=1]
(blank)
[QUOTE=1]
This is a quote from someone else
[/QUOTE]
This is someone else quoting someone else
[/QUOTE]

Then you run through the list converting the tags to HTML and outputting the plain-text unmodified. You can even get fancy and keep "nesting" counts so that if you encounter a "[/quote]" when you're not expecting it, you can handle the situation a bit better than just outputting invalid HTML. Alternatively, you just output things as you find them and let HTMLPurify or something clean it up later.

I dealt with this problem for quite a long time and would like to completely close the question by supplementing the answer of the respected ZyX. He suggested an excellent search template, but as Moe said, nested quotes themselves are not processed this way.
To solve this problem, I used a while loop and ZyX's pattern with preg_replace_callback function:


$text = 
'
[QUOTE=ctmcn]
    [quote="John Doe;1000"]Lorem ipsum dolor sit amet[/quote]
    Lorem ipsum dolor sit amet
[/QUOTE]
-----------------------------
[QUOTE="John Doe;103318"]
 [QUOTE]
  [QUOTE="John-Doe;103318"]
   Lorem ipsum dolor sit amet
  [/QUOTE]
 Lorem ipsum dolor sit amet
 [/QUOTE]
Lorem ipsum dolor sit amet

 [QUOTE="John-Doe;103318"]
 Lorem ipsum dolor sit amet
 [/QUOTE]
Lorem ipsum dolor sit amet
 [QUOTE="John-Doe;103318"]
 Lorem ipsum dolor sit amet
 [/QUOTE]
Lorem ipsum dolor sit amet
[/QUOTE]
Lorem ipsum dolor sit amet
----------------------------
[QUOTE="John-Doe;103318"]Lorem ipsum dolor sit amet[/QUOTE]
----------------------------
[QUOTE="Максим;103318"]Lorem ipsum dolor sit amet[/QUOTE]
Lorem ipsum dolor sit amet
'

$text = nestedQuotes($text);

function nestedQuotes($text) {
    while (preg_match('#\[quote=?(.*?)\](((?R)|.)*?)\[\/quote\]#is', $text)) {
        $text = preg_replace_callback (
            '#\[quote=?(.*?)\](((?R)|.)*?)\[\/quote\]#is',
            function($m) {
                if ($m[1]) {
                    if (strpos(';', $m[1])) {
                        list($qname, $qpostid) = str_replace('"', '', explode(";", $m[1]));
                        if ($qname && !$qpostid) return 'here code with quotet username and without source post_id, LIKE: [qoute="username";]text[/quote]';
                        if ($qname && $qpostid) return 'here code with quotet username and with source post_id, LIKE: [qoute="username;postid"]text[/quote]';
                    } else {
                        $qname = str_replace('"', '', $m[1]);
                        return 'here code with quotet username and without source post_id [qoute=username]text[/quote]';
                    }
                } else {
                    return 'here anonymous quote, LIKE: [quote]text/quote]';
                }
            },
            $text
        );
        echo $text;
    }
}

Like this you can beat any level nested and any type quotes.
preg_replace_callback function, allow to use function as replacement parameter, thanks to which, we can find out which response to return based on the form in which the original quote is (is it possible to get the name of the quoted user, id of the quoted message, etc.). Variable $m is array with captured groups.

$m[0] its all captured pattern.
$m[1] its first captured group
$m[2] its second captured group...

and so on.. But this method has a drawback. The search template is quite "greedy" and on large amounts of text I encountered a problem in the work of php. It does not return any values and does not report any errors. If anyone adds a more optimal search template that would be the perfect solution. Hope this help somebody!

继续阅读：bbcode php regex

Regex & BBCode - Perfecting Nested Quote

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？