Mechanize not working for automating gmail login in Google Appengine
I have used mechanize and deployed an app on GAE and it works fine. But, for an app that I am making, I am trying to automate login to gmail through mechanize. It doesn't work in the development environment on local machine as well as after deploying开发者_如何学运维 on appengine.
I have been able to use the same script to run it on my server through mod_python using PSP.
I found a lot of solutions here, but none of them seem to work for me. Here is a snippet of my code:
<snip>
br = mechanize.Browser()
response = br.open("http://www.gmail.com")
loginForm = br.forms().next()
loginForm["Email"] = self.request.get('user')
loginForm["Passwd"] = self.request.get('password')
response = br.open(loginForm.click())
response2 = br.open("http://mail.google.com/mail/h/")
result = response2.read()
<snip>
When I look at the result, all I get is the login page when used with appengine. But with mod_python hosted on my own server, I get the page with the user's inbox.
The problem is most likely due to how Google crippled the urllib2 module on GAE.
Internally it now uses the urlfetch module (which is something that Google wrote) and they have completely removed the HTTPCookieProcessor() functionality - meaning, cookies are NOT persisted from request to request which is the critical piece when automatically logging into sites programmatically.
There is a way around this, but not using mechanize. You have to roll your own Cookie processor - here is the basic approach I took (not perfect, but it gets the job done):
import urllib, urllib2, Cookie
from google.appengine.api import urlfetch
from urlparse import urljoin
import logging
class GAEOpener(object):
def __init__(self):
self.cookie = Cookie.SimpleCookie()
self.last_response = None
def open(self, url, data = None):
base_url = url
if data is None:
method = urlfetch.GET
else:
method = urlfetch.POST
while url is not None:
self.last_response = urlfetch.fetch(url = url,
payload = data,
method = method,
headers = self._get_headers(self.cookie),
allow_truncated = False,
follow_redirects = False,
deadline = 10
)
data = None # Next request will be a get, so no need to send the data again.
method = urlfetch.GET
self.cookie.load(self.last_response.headers.get('set-cookie', '')) # Load the cookies from the response
url = urljoin(base_url, self.last_response.headers.get('location'))
if url == base_url:
url = None
return self.last_response
def _get_headers(self, cookie):
headers = {
'Host' : '<ENTER HOST NAME HERE>',
'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
'Cookie' : self._make_cookie_header(cookie)
}
return headers
def _make_cookie_header(self, cookie):
cookie_header = ""
for value in cookie.values():
cookie_header += "%s=%s; " % (value.key, value.value)
return cookie_header
def get_cookie_header(self):
return self._make_cookie_header(self.cookie)
You can use it like you would urllib2.urlopen, except the method you would use is just "open".
精彩评论