How can I fetch the same URL with different query string with Perl's LWP::UserAgent?
I looked up articles about using LWP however I am still lost! On this site we find a list of many schools; see the overview-page and follow some of the links and get some result pages:
I want to parse the sites using LWP::UserAgent and for the parsing : want to use either HTML::TreeBuilder::XPath or HTML::TokeParser
At the moment I am musing bout choosing the right get-request! I have some issues with the LWP::Useragent. The subsite of the overview can be reached via direct links. but -note: each site has content. e.g. the following URLs of the above mentioned result-pages.
As a Novice here I cannot show you the endings of the different endings by posting the full URL but here you can see the endings:
id=21&extern_eid=709
id=21&extern_eid=789
id=21&extern_eid=1297
id=21&extern_eid=761
There are many different URLS that differ in the end of the URL. The question is : how to i run LWP::UserAgent? I want fetch and parse & ** all the - 1000 sites.**
Question; Does LWP do the job automatically!? Or do i have to set up LWP :: UserAgent that it will look up the different URLS automatically...
Solutions: Perhaps we have to count up form zero to 10000 with there
extern_eid=709 -(count from zero to 100000) here
www-db.sn.schule.de/index.php?id=21&extern_eid=709
BTW: Here the data for LWP User Agent;
REQUEST METHODS The methods described开发者_高级运维 in this section are used to dispatch requests via the user agent. The following request methods are provided:
$ua->get( $url ) $ua->get( $url , $field_name => $value, ... )
This method will dispatch a GET request on the given $url. Further arguments can be given to initialize the headers of the request. These are given as separate name/value pairs. The return value is a response object. See HTTP::Response for a description of the interface it provides. There will still be a response object returned when LWP can't connect to the server specified in the URL or when other failures in protocol handlers occur.
The question is: How to use LWP::UserAgent on the above mentioned site the right way - effectively!?
I look forward to any and all help!
If I understand your question correctly, you are trying to use LWP::UserAgent on same URLs with different query arguments, and you are wondering if LWP::UserAgent provides a way for you to loop through the query arguments?
I don't think LWP::UserAgent has a method for you to do that. However, you can have a loop constructing the URLs and use LWP::UserAgent repeatedly:
for my $id (0 .. 100000)
{
$ua->get($url."?id=21&extern_eid=".(709-$id))
//rest of the code
}
Alternatively you can add a request_prepare handler that computes and add the query arguments before you send out the request.
You describe following links for the purpose of web scraping. The LWP subclass WWW::Mechanize
does this more easily than your current attempt.
精彩评论