scraping website for info when the URL has product id's instead of true values
Im guessing its php cURL, but Whats the best way to make a loop to scrape the DOM for info from a webpage that uses id's in the URL Query开发者_如何学运维 like (?ProductId=103) There is about 1200 pages. I need to find the innerHTML of the 9th span on each page. This info will just get stored in a mySQL table (id->value) for future scraping of this site.
Well curl might be faster (not sure), but if it is a one off thing, then I would just use file_get_contents
for($x=0;$x<1200;$x++){
$f = file_get_contents(URL . '?productId='.$x);
#do stuff to $f
}
Yes. Use cURL to retrieve the page, use a DOM parser like SimpleXML to get the info you need out of it.
cURL
to speed things up you could use multi_curl =>
https://stackoverflow.com/search?q=[php]+multi_curl
scraping
the scraping part has been answered before better => for example https://stackoverflow.com/questions/3885760/scraping-and-web-crawling-framework-php.
You should search => https://stackoverflow.com/search?q=[php]+web+scraping
mySQL
I don't know if you do, but you should be using PDO to make it safe(SQL-injections).
精彩评论