Regex for bbcode seems to fail on long sentences
I need some help with my BBCode replacing. Right now i'm doing the following to find and replace bbcode:
$bbMatch[0] = '/(\[b\])(.*)(\[\/b\])/';
$bbReplace[0开发者_运维百科] = '<strong>${2}</strong>';
$bbMatch[1] = '/(\[url\])(.*)(\[\/url\])/';
$bbReplace[1] = '[url=${2}]${2}[/url]';
$bbMatch[2] = '/(\[url=)(.+)(\])(.+)(\[\/url\])/';
$bbReplace[2] = '<a href="${2}" target="_blank">${4}</a>';
$bbMatch[3] = '/(\[s\])(.*)(\[\/s\])/';
$bbReplace[3] = '<span style="text-decoration: line-through;">${2}</span>';
$bbMatch[4] = '/(\[u\])(.*)(\[\/u\])/';
$bbReplace[4] = '<span style="text-decoration: underline;">${2}</span>';
$bbMatch[5] = '/(\[i\])(.*)(\[\/i\])/';
$bbReplace[5] = '<em>${2}</em>';
// Remove bad characters
$text = htmlspecialchars($text);
// Parse Smilies
$text = preg_replace($bbMatch, $bbReplace, $text);
The problem here is that when a large sentence is ran through this, it fails to find the end tag. It would show this is an example:
"Some text in italics[/i] with some words here [i]also text in italics
As you can see, it shows the end tag of the first one, and the begin tag of the second. How would i fix this?
You're problem is that regex is greedy by default. So it will grab everything between the first [i]
and the last [/i]
. Since you told it to grab wildcard characters in between those to sets of characters, and it tries to grab as many as it can it will gladly grab the [/i] and [i]
as long as there is a surrounding [i]..[/i]
. You just need to add a ?
after the *
to make it non greedy EX;
$bbMatch[5] = '/(\[i\])(.*?)(\[\/i\])/';
$bbReplace[5] = '<em>${2}</em>';
You'll want to change all your regexes like that btw, not just your italics.
Here is an example of greedy vs. non-greedy regex: http://www.exampledepot.com/egs/java.util.regex/Greedy.html
精彩评论