scrape the data from html page php
I need to scrape the data from an html page
<div style="margin-top: 0px; padding-right: 5px;" class="lftFlt1">
<a href="" onclick="setList1(157204);return false;" class="contentSubHead" title="USA USA">USA USA</a>
<div style="display: inline; margin-right: 10px;"><开发者_JS百科;a href="" onclick="rate('157204');return false;"><img src="http://icdn.raaga.com/3_s.gif" title="RATING: 3.29" style="position: relative; left: 5px;" height="10" width="60" border="0"></a></div>
</div>
I need to scrape the "USA USA" and 157204 from the onclick="setList1
...
You should use DOMDocument or XPath. RegEx is generally not recommended for parsing HTML.
Use regex:
/setList1\(([0-9]+)\)[^>]+title="([^"]+)"/si
and preg_match() or preg_match_all()
Please go through my previous answers about how to handle HTML with DOM.
XPath to get the Text Content of all anchor elements:
//a/text()
XPath to get the title attribute of all anchor elements:
//a/@title
XPath to get the onclick attribute of all anchor elements:
//a/@onclick
You will have to use some string function to extract the number from the onclick text.
By far the best lib for scraping is simple html dom. basically uses jquery selector syntax.
http://simplehtmldom.sourceforge.net/
The way you'd get the data in this example:
include("simple_html_dom.php");
$dom=str_get_html("page.html");
$text=$dom->find(".lftFlt1 a.contentSubHead",0)->plaintext;
//or
$text=$dom->find(".lftFlt1 a.contentSubHead",0)->title;
I did it this way
$a=$coll->find('div[class=lftFlt1]');
$text=$element->find("a[class=cursor]",0)->onclick;
精彩评论