开发者

Regex & BBCode - Perfecting Nested Quote

I'm working on some BBcode for my website.

I've managed to get most of the codes working perfectly, however the [QUOTE] tag is giving me some grief.

When I get something like this:

[QUOTE=1]
[QUOTE=2]
This is a quote from someone else
[/QUOTE]
This is someone else quoting someone else
[/QUOTE]

It will return:

> 1 said:  [QUOTE=2]This is a开发者_StackOverflow中文版 quote from
> someone else

This is someone else quoting someone else[/QUOTE]

So what is happening is the [/quote] from the nested quote is closing the quote block.

The Regex I am using is:

"[quote=(.*?)\](.*?)\[/quote\]'is"

How can I make it so nested Quotes will appear properly?

Thank you.


You could construct recursive regular expression (available since libpcre-3.0 according to their changelog):

\[quote=(.*?)\](((?R)|.)*?)\[\/quote\]

But it would be better if you follow @codeka advice.

Update: (?R) here means «insert the whole regular expression in place where (?R) occurs». So a(?R)?b is equivalent (if you forget about capturing groups) to a(a(?-1)?b)?b which is equivalent to a(a(a(?-1)?b)?b)?b and so on. Instead of (?R) you can use (?N), (?+N), (?-N) and (?&a) which means «substitute with N'th capturing group», «substitute with N'th next capturing group», «substitute with N'th previous capturing group» and «substitute with capturing group named «a»».


This is not really a task that regular expressions are good for. It's almost like trying to parse HTML with regular expressions, and we know what happens when you do that...

What you could do, and even then I don't think it's all that great an idea, is to use preg_split to split your input text into tags-and-non-tags. So you'll end up with a list like this:

  • [QUOTE=1]
  • (blank)
  • [QUOTE=1]
  • This is a quote from someone else
  • [/QUOTE]
  • This is someone else quoting someone else
  • [/QUOTE]

Then you run through the list converting the tags to HTML and outputting the plain-text unmodified. You can even get fancy and keep "nesting" counts so that if you encounter a "[/quote]" when you're not expecting it, you can handle the situation a bit better than just outputting invalid HTML. Alternatively, you just output things as you find them and let HTMLPurify or something clean it up later.


I dealt with this problem for quite a long time and would like to completely close the question by supplementing the answer of the respected ZyX. He suggested an excellent search template, but as Moe said, nested quotes themselves are not processed this way.
To solve this problem, I used a while loop and ZyX's pattern with preg_replace_callback function:


$text = 
'
[QUOTE=ctmcn]
    [quote="John Doe;1000"]Lorem ipsum dolor sit amet[/quote]
    Lorem ipsum dolor sit amet
[/QUOTE]
-----------------------------
[QUOTE="John Doe;103318"]
 [QUOTE]
  [QUOTE="John-Doe;103318"]
   Lorem ipsum dolor sit amet
  [/QUOTE]
 Lorem ipsum dolor sit amet
 [/QUOTE]
Lorem ipsum dolor sit amet

 [QUOTE="John-Doe;103318"]
 Lorem ipsum dolor sit amet
 [/QUOTE]
Lorem ipsum dolor sit amet
 [QUOTE="John-Doe;103318"]
 Lorem ipsum dolor sit amet
 [/QUOTE]
Lorem ipsum dolor sit amet
[/QUOTE]
Lorem ipsum dolor sit amet
----------------------------
[QUOTE="John-Doe;103318"]Lorem ipsum dolor sit amet[/QUOTE]
----------------------------
[QUOTE="Максим;103318"]Lorem ipsum dolor sit amet[/QUOTE]
Lorem ipsum dolor sit amet
'

$text = nestedQuotes($text);

function nestedQuotes($text) {
    while (preg_match('#\[quote=?(.*?)\](((?R)|.)*?)\[\/quote\]#is', $text)) {
        $text = preg_replace_callback (
            '#\[quote=?(.*?)\](((?R)|.)*?)\[\/quote\]#is',
            function($m) {
                if ($m[1]) {
                    if (strpos(';', $m[1])) {
                        list($qname, $qpostid) = str_replace('"', '', explode(";", $m[1]));
                        if ($qname && !$qpostid) return 'here code with quotet username and without source post_id, LIKE: [qoute="username";]text[/quote]';
                        if ($qname && $qpostid) return 'here code with quotet username and with source post_id, LIKE: [qoute="username;postid"]text[/quote]';
                    } else {
                        $qname = str_replace('"', '', $m[1]);
                        return 'here code with quotet username and without source post_id [qoute=username]text[/quote]';
                    }
                } else {
                    return 'here anonymous quote, LIKE: [quote]text/quote]';
                }
            },
            $text
        );
        echo $text;
    }
}

Like this you can beat any level nested and any type quotes.
preg_replace_callback function, allow to use function as replacement parameter, thanks to which, we can find out which response to return based on the form in which the original quote is (is it possible to get the name of the quoted user, id of the quoted message, etc.). Variable $m is array with captured groups.

$m[0] its all captured pattern.
$m[1] its first captured group
$m[2] its second captured group...

and so on.. But this method has a drawback. The search template is quite "greedy" and on large amounts of text I encountered a problem in the work of php. It does not return any values and does not report any errors. If anyone adds a more optimal search template that would be the perfect solution. Hope this help somebody!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜