开发者

A RegEx problem in PHP

I could not think of a proper title. I have some data like -

$data = <<<EOD
<strong>
HHHHH
<strong>
TTTTT
<strong>
RRRRRRR
<strong>
EOD;

Basically above one is jus开发者_如何学JAVAt an example. In real, the data is like -

<strong>Some Title</strong>
DATA
<strong>Some other Title</strong>
OTHER DATA

Sample: http://pastebin.com/cxzZWDZ8

Now I apply the following RegEx.

preg_match_all("%<strong>(.*?)<strong>%s", $data, $all);

This matches, HHHHH and RRRRRRR but I want to match TTTTT. How can I do this?


You could use a lookahead assertion to ensure the <strong> is there, but isn't part of the match (so it can be part of the next match):

</strong>(.*?)(?=<strong>)

However, if what you've got is HTML, you should use an HTML parser to read it and not regex which is infamously poor at parsing HTML/XML markup. With DOMDocument::loadHTML(), getElementsByName and so on you'll have a much more reliable way of scraping page data.


maybe its just a typo but shouldn't your write something like:

preg_match_all("%</strong>(.*?)<strong>%s", $data, $all);

on your first exemple i dont see that you're closing the tags.. but on the "real" exemple you are.. maybe that's it

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜