开发者

How do I crawl a site with dynamic forms using WWW::Mechanize?

I like to retrieve and store the values of an HTML table from a Web site which uses some Javascript and has an URL which ends on .aspx, by writing a Web crawler in Perl.

The Web site provides some data on election results.

You have a search form with two options as drop down menus, Province provlist and City/Municipality munlist.

  • You choose the Province. The web page gets reloaded to the same URL and changes the list of available options of the second drop down menu, i.e. City/Municipality.
  • Now you can choose your City/Municipality and after clicking the button SEARCH, a HTML table becomes visible with the results.

I like to retrieve all these tables and their results.

I like to do it with Perl, however so far I have only written very small/simple scripts. It would be very helpful if you have some general informations on how I should start this task.

  1. I have used some of the WWW::Mechanize functions before, only a few though. Can I do this job with the WWW::Mechanize functions, are these functions sufficient? Or do I need additional packages?
  2. The FAQ for WWW::Mechanize states that it has some problems with Javascript. However, in another post I read it may be possible to avoid the this Javascript. Does the called Javascript function for one of the drop down menus cause a problem?

    <select name="provlist" onchange="javascript:setTimeout('__doPostBack(\'provlist\',\'\')开发者_开发技巧', 0)" id="provlist" tabindex="1">
    
  3. How troublesome is the ASPX framework?

As I have said before, I only have a little experience with writing Perl crawlers, so any information/hints/etc. you can provide are highly appreciated.


  1. It's sufficient.
  2. The form degrades okay without Javascript. Submit the form once with a different provlist item, e.g. AGUSAN DEL NORTE, and the response page will have the appropriate munlist (BUENAVISTA, etc.), and the form will be set to the first item of the list, and the table will have the data for the first item.


The FAQ for WWW::Mechanize states that it has some problems with Javascript. However, in another post I read it may be possible to avoid the this Javascript. Does the called Javascript function for one of the drop down menus cause a problem?

If you want to mechanize JS heavy pages, you probably want to look at WWW::Mechanize::Firefox

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜