开发者

Reading Web 2.0 HTML Source Code with Perl

Is it possible to read HTML Web 2.0 Source Code that is dynamically generated ? The Perl LWP with its agent->response does not pick up any dynamically generated HTML code.

Many websites today are generating dynamic html. If 开发者_开发百科I am shoppping for best prices, and the prices are dynamically fetched and dumped, then I am out of business.

Are we reaching the end of a era?


Yes, we are reaching the end of the era of unreliable screen scraping and the beginning of the era of well-defined APIs.

Personally I hate the term "Web 2.0", but at least Wikipedia lists web APIs as an important part of the whole thing.


If by "Web 2.0 HTML" and "dynamically generated" you mean "DOM generated from JavaScript" then you have to process the JavaScript.

You can either do that manually and write code to scrape data out of the JS or use whatever data sources the JS does, or you can use a JS aware parser (I usually use MozRepl these days).

Keep in mind that the terms and conditions of many sites forbid screen scraping.

The best solution is to use an API which is stable and not subject to change. The documentation for the site you wish to get data from may describe an API, or you can contact the developers and see if they can make one available to you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜