开发者

Perl WWW::Mechanize as child class; can't stay logged in to scraped site

I have a simple login script using Perl WWW::Mechanize. I am scripting logins to Moodle. When I just do the login steps as procedural steps, it works. For example (assume "$site_url", USERNAME, and PASSWORD been set appropriately):

#THIS WORKS
$updater->get("http://".$site_url."/login/index.php");
$updater->form_id("login");
$updater->field('username', USERNAME);
$updater->field('password', PASSWORD);
$updater->click();
$updater->get("http://".$site_url."/");
print $updater->content();

When I try to encapsulate these steps inside a child class of WWW:Mechanize, the get() and content() and other methods seem to work, but logging in to the site does not. I have a feeling it has to do with variable scope, but I don't know how to resolve it.

Example (fails):

my $updater = new AutoUpdater( $site_url, USERNAME, PASSWORD );
$updater->do_login();

{
package AutoUpdater;
use base qw( WWW::Mechanize );

sub new {
    my $class = shift;
    my $self = {
        site_url => shift,
        USERNAME => shift,
        PASSWORD => shift,
   };
    bless $self, $class;
    return $self;
}

sub do_login {
    my $self = shift;
    $self->get("http://".$site_url."/");
    $self->get("http://".$site_url."/login/index.php");
    $self->form_id("login");
    $self->fiel开发者_运维技巧d("username", $self->{USERNAME});
    $self->field("password", $self->{PASSWORD});
    $self->click();
    $self->get("http://".$site_url."/");
    print $self->content();
}
}

This fails. "Fail" means it does not log in. It does grab the web page, though, and I can manipulate the HTML data. It just doesn't log in. Yargh! (Yes, the "yargh" was necessary)

Thanks!


Here's a revised version:

use strict;
use warnings;

my $updater = AutoUpdater->new( $site_url, USERNAME, PASSWORD );
$updater->do_login();

{
package AutoUpdater;
use parent qw( WWW::Mechanize );

sub new {
  my $class = shift;

  my $self = $class->SUPER::new();

  $self->{AutoUpdater} = {
    site_url => shift,
    USERNAME => shift,
    PASSWORD => shift,
  };

  return $self;
}

sub do_login {
  my $self = shift;
  my $data = $self->{AutoUpdater};

  $self->get("http://$data->{site_url}/login/index.php");
  $self->form_id("login");
  $self->field("username", $data->{USERNAME});
  $self->field("password", $data->{PASSWORD});
  $self->click();
  $self->get("http://$data->{site_url}/");
  print $self->content();
}
} # end package AutoUpdater

Some notes:

You should always use strict and warnings to help catch your mistakes.

Indirect object syntax is discouraged. Use Class->new instead of new Class.

The base pragma has some undesirable effects that can't be fixed for backwards compatibility reasons. The parent pragma was developed to replace it.

Your big problem was that Perl doesn't automatically initialize base classes. You have to explicitly call $class->SUPER::new if necessary.

Your other big problem was understanding how object instance data is handled. Most Perl objects are hashrefs, and you access instance data using hashref syntax. When subclassing a class I didn't write, I like to use a second hashref to avoid conflicts with the parent class. Remember that you're sharing the object with the base classes. If your subclass uses the site_url field, and then a later release of the base class starts using site_url for something else, your code will suddenly break for no obvious reason. By using only one key in the base object hashref (and one that the base class is unlikely to start using), you minimize the chance of future breakage.

While Moose provides some nice features for OO Perl programming, if you're just writing a fairly simple subclass of a non-Moose class, it's probably best to avoid it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜