开发者

PHP regex...getting first instance of end bracket?

Hi i am trying to parse some homemade bbcode i came up with and having a difficult time with something. I am new to regex but thought this would be a great way to teach myself.

[%url=http://google.com]google link[/url%]

<a href='google.com' google link </a>

[%video=http://youtube.com?v=blah]

i will run the link through a automatic embed function 
developed in php..i just need to parse the link

[%PAGEBREAK%]

<hr>

[%img=wateva.jpg%]

<img src='wateva.jpg'>

So far I have done the url one which worked great...see below

$url_pattern = "/\[\s*%\s*(URL|url)\s*=\s*(.*)\](.*)\[\s*\/\s*(URL|url)\s*%\s*\]/i";
$description = preg_replace($url_pattern, "<a href='$2'>$3</a>", $description);

But when i tried to do the image...(see below)

$img_pattern ="/\[\s*%\s*(IMG|img)=(.*)\s*(%\s*\])/i";
$description = preg_replace($img_pattern, "<img src=\'$2\' style='width: 700px; height: auto; display:block;\'>", $description);

It picks up the last "%]" of the whole text instead of the closest "%]"..how do i tell it to find the closest %]?

Here is my testing TEXT:

*100 word minimum. Give a description of your project combined with images, video, and or links.. just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like开发者_Go百科 links and videos.100 word minimum. Give a description of your project combined with images, video, and or links.. just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.100 word minimum. Give a description of your project combined with images, video, and or links..

[%PAGEBREAK%]

[%IMG=uploads/06-26-11/Cog.gif%]

just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.100 word minimum. Give a description of your project combined with images, video, and or links.. just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.

This is a [%URL=http://google.com]link[/URL%]

Here is a video by gang gang dance

[%VIDEO=http://www.youtube.com/watch?v=lZMFwKVjV5s%]*


The problem is most likely .* in /\[\s*%\s*(IMG|img)=(.*)\s*(%\s*\])/i. * is greedy - it will match to the end of the document, and then backtrack to the last %] to match it. Normally, the problem would be hidden unless you've set the /s flag, which causes . to match newlines (and also called the Dot-All flag).
A simple solution is to use a lazy quantifier, so .*? matches nothing by default, but then machtrack to match more and more character until it finds the first %]:

/\[\s*%\s*(img)=(.*?)\s*(%\s*\])/i

A better option is to define what alphabet is acceptable in img tags. For example, anything other than a ] or a newline:

/\[\s*%\s*(img)=([^\]\n\r]*)\s*(%\s*\])/i

See also: Laziness Instead of Greediness

You probably want to fix the other patterns as well, they share the same problem.
Finally, I'd advice to look at an implementation of an existing bbcode parser. These codes can have nested constructs (for example, an image in a link in a blockquote), making them tricky to parse correctly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜