开发者

php regex optionally match a whole word

im using php and i need to scrape some information from some curl responses to a site. i am simulating both an ajax request by a browser and a normal (entire) page request by a browser, however the ajax response is slightly different to the entire page request in this section of the html.

the ajax res开发者_开发百科ponse is: <div id="accountProfile"><h2>THIS IS THE BIT I WANT</h2><dl id="accountProfileData">

however the normal response is: <div id="accountProfile"><html xmlns="http://www.w3.org/1999/xhtml"><h2>THIS IS THE BIT I WANT</h2><dl id="accountProfileData">

ie the ajax response is missing the tag: <html xmlns="http://www.w3.org/1999/xhtml">. i need to get the bits in between the h2 tags. obviously i can't just scrape the page for <h2>THIS IS THE BIT I WANT</h2><dl id="accountProfileData"> since these tags may occur in other places and not contain the information i want.

i can match either one of the patterns individually, however i would like to do both in a single regex. here is my solution for matching the ajax response:

<?php
$pattern = '/\<div id="accountProfile"\>\<h2\>(.+?)\<\/h2\>\<dl id="accountProfileData"\>/';
preg_match($pattern, $haystack, $matches);
print_r($matches);
?>

can someone show me how i should alter the pattern to optionally match the <html xmlns="http://www.w3.org/1999/xhtml"> tag aswell? if it helps to simplify the haystack for the purposes of brevity that's fine.


I haven't tested it, but you can try this:

    $pattern = '/\<div id="accountProfile"\>(\<html xmlns=\"http://www.w3.org/1999/xhtml\"\>){0,1}\<h2\>(.+?)\<\/h2\>\<dl id="accountProfileData"\>/';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜