开发者

scraping website for info when the URL has product id's instead of true values

Im guessing its php cURL, but Whats the best way to make a loop to scrape the DOM for info from a webpage that uses id's in the URL Query开发者_如何学运维 like (?ProductId=103) There is about 1200 pages. I need to find the innerHTML of the 9th span on each page. This info will just get stored in a mySQL table (id->value) for future scraping of this site.


Well curl might be faster (not sure), but if it is a one off thing, then I would just use file_get_contents

for($x=0;$x<1200;$x++){
  $f = file_get_contents(URL . '?productId='.$x);
  #do stuff to $f
 }


Yes. Use cURL to retrieve the page, use a DOM parser like SimpleXML to get the info you need out of it.


cURL

to speed things up you could use multi_curl =>

https://stackoverflow.com/search?q=[php]+multi_curl

scraping

the scraping part has been answered before better => for example https://stackoverflow.com/questions/3885760/scraping-and-web-crawling-framework-php.

You should search => https://stackoverflow.com/search?q=[php]+web+scraping

mySQL

I don't know if you do, but you should be using PDO to make it safe(SQL-injections).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜