Easiest way to scrape Google for URLs via my browser?
I'd like to scrape all the URLs my searches return when searching for stuff via Google. I've tried making a script, but Google did not like it, and adding cookie support and captcha was too tedious. I'm looking for something that - when I'm browsing through the Google search pages - will simply take all the URLs on the pages and put them inside a .txt file or store them somehow开发者_JAVA技巧. Does any of you know of something that will do that? Perhaps a greasemonkey script or a firefox addon? Would be greatly appreciated. Thanks!
See the JSON/Atom Custom Search API.
I've done something similar for Google Scholar where there's no API available. My approach was basically to create a proxy web server (a java web app on Tomcat) that would fetch the page, do something with it and then show to user. This is 100% functional solution but requires quite some coding. If you are interested I can get into more details and put up some code.
Google search results are very easy to scrape. Here is an example in php.
<?
# a trivial example of how to scrape google
$html = file_get_contents("http://www.google.com/search?q=pokemon");
$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//div[@id='ires']//h3//a") as $node)
{
echo $node->getAttribute("href")."\n";
}
?>
You may try IRobotSoft bookmark addon at http://irobotsoft.com/bookmark/index.html
精彩评论