Googling within a Perl script? [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
开发者_JS百科 Improve this questionI have a string in $var.
Using Perl, how can I pass this string to Google and get back an array of Google search results?
Before proceeding, please be aware of the Google Terms of Service.
You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.
That being said, there exists an official API to query web search programmatically.
The JSON/Atom Custom Search API lets you develop websites and programs to retrieve and display search results from your Google Custom Search programmatically. With this API, you can use RESTful requests to get search results in either JSON or Atom format.
You can use XML::Atom::Client or LWP+JSON::Any or many other libraries to perform the REST calls.
(You may still find references to the older Google Web Search API but it's deprecated and limited.)
Take a look at the Google Custom search API: http://code.google.com/apis/customsearch/
If you need to search over a wider variety of hosts, you'll need to use the older, deprecated Websearch API, but that will limit the number of queries you can make per day.
Barring that, you'll need to do a lot of html scraping and parsing.
Here is how a simple script could look like (and yes it violates TOS so it's just PoC, and you shouldn't use it...)
use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;
my $mech = new WWW::Mechanize;
my $option = shift;
#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q=';
my @dork = ("this is my search one","this is my search two");
#declare necessary variables
my $max = 0;
my $link;
my $sc = scalar(@dork);
#start the main loop, one itineration for every google search
for my $i ( 0 .. $sc ) {
#loop until the maximum number of results chosen isn't reached
while ( $max <= $option ) {
#say $google . $dork[$i] . "&start=" . $max;
$mech->get( $google . $dork[$i] . "&start=" . $max );
#get all the google results
foreach $link ( $mech->links() ) {
my $google_url = $link->url;
if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
say $google_url;
}
}
$max += 10;
}
}
By the way I wrote this a while back, so it's not exactly up to the par, but it does the job, and I am too lazy to boot linux to find the newer version of this...
精彩评论