A RegEx problem in PHP
I could not think of a proper title. I have some data like -
$data = <<<EOD
<strong>
HHHHH
<strong>
TTTTT
<strong>
RRRRRRR
<strong>
EOD;
Basically above one is jus开发者_如何学JAVAt an example. In real, the data is like -
<strong>Some Title</strong>
DATA
<strong>Some other Title</strong>
OTHER DATA
Sample: http://pastebin.com/cxzZWDZ8
Now I apply the following RegEx.
preg_match_all("%<strong>(.*?)<strong>%s", $data, $all);
This matches, HHHHH
and RRRRRRR
but I want to match TTTTT
. How can I do this?
You could use a lookahead assertion to ensure the <strong>
is there, but isn't part of the match (so it can be part of the next match):
</strong>(.*?)(?=<strong>)
However, if what you've got is HTML, you should use an HTML parser to read it and not regex which is infamously poor at parsing HTML/XML markup. With DOMDocument::loadHTML()
, getElementsByName
and so on you'll have a much more reliable way of scraping page data.
maybe its just a typo but shouldn't your write something like:
preg_match_all("%</strong>(.*?)<strong>%s", $data, $all);
on your first exemple i dont see that you're closing the tags.. but on the "real" exemple you are.. maybe that's it
精彩评论