scraping a webstore with pages having AJAX controlled item counts?

2023-03-16 10:17 问答作者：

I maintain a hobby website that, among other things, chronicles whether certain items are in print or out of print at a particular web store.

The store's management removes products when they are out of stock, and re-adds the pages when they're back in stock.

Scraping the category page's item list for item titles is easy enough, but I'm not sure what to do about pages with more results than are开发者_JS百科 shown.

The pages default to 10 items, and clicking Next loads up the next 10 via AJAX.

Is there a standard way of handling and scraping such setups?

If you use the developer feature of your web browser (Firebug, Inspector, Developer Tools, ...) you should able to see the connections being made to retrieve the data through Ajax and the request and response headers being sent and received.

The request headers will contain the data being sent as well as the URL that's been request. The query string of the URL or the POST data would most likely contain a "start" or "next" or some time of parameter that identifies the start and number of results to return.

You can then use PHP and cURL to automate the rest of the process.

Here's a screenshot of what the "Web Inspector" looks like in Safari 5.1 on OS X (Chrome looks identical):

scraping a webstore with pages having AJAX controlled item counts?

What's relevant to you here is the Request URL, Request Method and what's under Form Data. The text on the left (in light grey) is the parameter and the text on the right is the value.

继续阅读：curl php web-scraping

scraping a webstore with pages having AJAX controlled item counts?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？