Regular Expression: Is there a way to tell preg_match_all to use the third match it finds skipping the first two?
Is there a way to tell preg_match_all to use the third match it finds skipping the first two? For example, I have the following HTML
<div class="entry">
    <div class="text">BlaBlaBla</div>
    <div class="date">2009-10-31</div>
</div>
I need preg_match_all to get the contents of the outermost div, and not stop at the first /div it encounters.开发者_开发知识库
You would be much better served by something like an XML/HTML parser. See here.
This is the class of problem that regular expressions theoretically cannot handle: recursively defined structures. Extended RE's might be able to sort-of do it, but (to mix metaphors) it's better to punt and pick up a different tool.
Having said that, PCRE specifically has a recursive pattern feature, the typical demonstration is \((a*|(?R))*\) which can handle any combination of balanced parens and as. So you can probably adapt that, but you are trying to do something that I wouldn't try to do with REs.
Update: I'm not sure how useful this will be, but:
php > $t = "<div> how <div> now is the time </div>  now </div>";
php > preg_match('/<div>(.*|(?R))*<\/div>/',$t,$m); print_r($m);
Array
(
    [0] => <div> how <div> now is the time </div>  now </div>
    [1] => 
)
php > 
You can use XPath's "Axis specifiers" and "node set functions"
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论