I\'d like to scrape all the URLs my searches return when searching for stuff via Google. I\'ve tried making a script, but Google did not like it, and adding cookie support and captcha was too tedious.
I want to write an application to capture data from a website, and th开发者_开发技巧e website are using ajax to retrieve data from server.
What I am trying to do is create a database of all bars in the united states.I need this database to be updated semi-regularly (every week or so) to include newly opened bars.
Currently, my Nokogiri script iterates through Google\'s SERPs until it finds the position of the target website. It does this for each keyword for each website that each user specifies (users are cap
I want to write a Google Chrome Extension that can take information from a site that I do not own (www.notmysite.com), send that info to a site I do own (www.mysite.com), and do some sort of MySQL que
I\'m trying to grab a list of all titles from the site Reddit.com using lxml.I used this query: reddit = etree.HTML( urllib.urlopen(\"http://www.reddit.com/r/all/top\").read() )
I need some help with my xpath query.I can get this code to work with just about every site I need to scrape except this small part of a particular site...I just get a blank page...Does anyone have an
I am having a problem getting/staying logged in with perl mechanize to a website Looking at the headers, it appears that the JSESSIONID keeps changing.I am using a cookie jar, but I think it\'s getti
I\'ve been tasked with building a screen scraping a开发者_运维技巧pplication, and I\'m looking for information on the best way to cope with web pages that would normally require user input and interac
I am using simple html dom parser to scrape a website ... How can i skip a 开发者_开发技巧particular class while in a loop Judging from http://simplehtmldom.sourceforge.net/manual.htm#frag_find_attr y