开发者

scraping a webstore with pages having AJAX controlled item counts?

I maintain a hobby website that, among other things, chronicles whether certain items are in print or out of print at a particular web store.

The store's management removes products when they are out of stock, and re-adds the pages when they're back in stock.

Scraping the category page's item list for item titles is easy enough, but I'm not sure what to do about pages with more results than are开发者_JS百科 shown.

The pages default to 10 items, and clicking Next loads up the next 10 via AJAX.

Is there a standard way of handling and scraping such setups?


If you use the developer feature of your web browser (Firebug, Inspector, Developer Tools, ...) you should able to see the connections being made to retrieve the data through Ajax and the request and response headers being sent and received.

The request headers will contain the data being sent as well as the URL that's been request. The query string of the URL or the POST data would most likely contain a "start" or "next" or some time of parameter that identifies the start and number of results to return.

You can then use PHP and cURL to automate the rest of the process.

Here's a screenshot of what the "Web Inspector" looks like in Safari 5.1 on OS X (Chrome looks identical):

scraping a webstore with pages having AJAX controlled item counts?

What's relevant to you here is the Request URL, Request Method and what's under Form Data. The text on the left (in light grey) is the parameter and the text on the right is the value.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜