开发者

Regular Expression: Is there a way to tell preg_match_all to use the third match it finds skipping the first two?

Is there a way to tell preg_match_all to use the third match it finds skipping the first two? For example, I have the following HTML

<div class="entry">
    <div class="text">BlaBlaBla</div>
    <div class="date">2009-10-31</div>
</div>

I need preg_match_all to get the contents of the outermost div, and not stop at the first /div it encounters.开发者_开发知识库


You would be much better served by something like an XML/HTML parser. See here.


This is the class of problem that regular expressions theoretically cannot handle: recursively defined structures. Extended RE's might be able to sort-of do it, but (to mix metaphors) it's better to punt and pick up a different tool.

Having said that, PCRE specifically has a recursive pattern feature, the typical demonstration is \((a*|(?R))*\) which can handle any combination of balanced parens and as. So you can probably adapt that, but you are trying to do something that I wouldn't try to do with REs.

Update: I'm not sure how useful this will be, but:

php > $t = "<div> how <div> now is the time </div>  now </div>";
php > preg_match('/<div>(.*|(?R))*<\/div>/',$t,$m); print_r($m);
Array
(
    [0] => <div> how <div> now is the time </div>  now </div>
    [1] => 
)
php > 


You can use XPath's "Axis specifiers" and "node set functions"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜