Regular Expression: Is there a way to tell preg_match_all to use the third match it finds skipping the first two?
Is there a way to tell preg_match_all to use the third match it finds skipping the first two? For example, I have the following HTML
<div class="entry">
<div class="text">BlaBlaBla</div>
<div class="date">2009-10-31</div>
</div>
I need preg_match_all to get the contents of the outermost div, and not stop at the first /div it encounters.开发者_开发知识库
You would be much better served by something like an XML/HTML parser. See here.
This is the class of problem that regular expressions theoretically cannot handle: recursively defined structures. Extended RE's might be able to sort-of do it, but (to mix metaphors) it's better to punt and pick up a different tool.
Having said that, PCRE specifically has a recursive pattern feature, the typical demonstration is \((a*|(?R))*\)
which can handle any combination of balanced parens and a
s. So you can probably adapt that, but you are trying to do something that I wouldn't try to do with REs.
Update: I'm not sure how useful this will be, but:
php > $t = "<div> how <div> now is the time </div> now </div>";
php > preg_match('/<div>(.*|(?R))*<\/div>/',$t,$m); print_r($m);
Array
(
[0] => <div> how <div> now is the time </div> now </div>
[1] =>
)
php >
You can use XPath's "Axis specifiers" and "node set functions"
精彩评论