Scrape gmail for the last time external pop accounts were checked and check them if longer than X time since the last check
Goal: To develop an script that will check the last time my external pop accounts were checked by google -- while not being logged in. If the time exceeds some amount, then check the pop account.
My Reason: I use an offline client. I don't want to be logged into gmail and I want all my external emails to flow thru gmail. Sometimes an important email comes in and I have to log into gmail, go to the account section, and then click "check email". This is incredibly annoying. I wish they had the ability to poll for pop account at a specified frequency. Instead they use an algorithm that can range from 1 minute to 1 hour.
My Approaches so far: So I can log into gmail using curl. I can scrape the pages. The problem is that google uses javascript/ajax goodness so curl does gets the html version of开发者_如何学JAVA gmail and that version does not have the info that I am looking for. It's only available on the ajax version of gmail.
I can use selenium, but essentially I have to have firefox open. I don't want that. I want a solution that can run in the background that will check every 10 minutes.
My suspicions on how to go about this: I've seen several posts about using headless browsers with javascript capabilities. Apparently some of these can be controlled using python. However, this seems quite complicated.
Thus, my questions What is the best way to solve my problem? My preference is to use python, but I am open to other languages as well. Will I have to use javascript to accomplish this task? Is a headless browser necessary or are there other alternatives?
Thank you.
Probably http://www.phantomjs.org/ is going to be the best tool for this job. They have lots of examples in their github repository for how to do this type of thing. People have had good success with complex scraping tasks.
精彩评论