开发者

(.*) instead of (.*?)

Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.

<li>Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4</li>

If I use the line below

preg_match_all('/<li>(.*)<\/li>/', $text, $result);

i will get an array with a single row containing:

Content1</li>
<li>Conte开发者_C百科nt2</li>
<li>Content3</li>
<li>Content4

And by using this code:

preg_match_all('/<li>(.*?)<\/li>/', $text, $result);

i will get an array with 4 row containing Content1, Content2, ...

Why (.*) is not working since it means match any character zero or more times


* matches in a greedy fashion, *? matches in a non-greedy fashion.

What this means is that .* will match as many characters as possible, including all intermediate </li><li> pairs, stopping only at the last occurrence of </li>. On the other hand, .*? will match as few characters as possible, stopping at the first occurrence of </li>.


Because .* itself is greedy and eats up as much as it can (i.e. up to the last </li>) while still allowing the pattern to match. .*? on the other hand is not greedy and eats up as little as possible (stopping at first </li>).


See this article's section about greedyness of regular expressions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜