Grab Form Data Via Python
I'm looking to grab the form data that needs to be passed along to a specific website and submit it. Below is the html(form only) that I need to simulate. I've been working on this for a few hours, but can't seem to get anything to work. I want this to work in Google App Engine. Any help would be nice.
<form method="post" action="/member/index.bv">
<table cellspacing="0" cellpadding="0" border="0" width="100%">
<tr>
<td align="left">
<h3>member login</h3><input type="hidden" name="submit" value="login" /><br />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
email:<br />
<input type="text" name="email" style="width: 140px;" />
</td>
</tr>
<tr>
<td align="left" style="color: #8b6c46;">
password:<br />
<input type="password" name="password" style="width: 140px;" />
</td>
</t>
<tr>
<td>
<input type="image" class="formElementImageButton" src="/resources/default/images/btnLogin.gif" style="width: 46px; height: 17px;" />
</td>
</tr>
<tr>
<td align="left">
<div style="line-height: 1.5em;">
<a href="/join/" style="color: #8b6c46; font-weight: bold; text-decoration: underline; ">join</a><br />
<a href="/member/forgot/" style="color: #8b6c46; font-weight: bold; text-decoration: underline;">forgot password?</a><input type="hidden" name="lastplace" value="%2F"><br />
having trouble logging on, <a href="/cookieProblems.bv">click here</a> for help
</div>
</td>
</tr>
</table>
</form>
currently I'm trying to use this code to access it, but it's not working. I'm pretty new to this, so maybe I'm just missing it.
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'email' : 'someemail@gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_pa开发者_如何学Cge = response.read()
Is this login page for a 3rd party site? If so, there may be more to it than simply posting the form inputs.
For example, I just tried this with the login page on one of my own sites. A simple post request won't work in my case, and this may be the same with the login page you are accessing as well.
For starters the login form may have a hidden csrf token value that you have to send when posting your login request. This means you'd have to first get
the login page and parse the resulting html for the csrf token
value. The server may also require its session cookie in the login request.
I'm using the requests module to handle the get/post and beautifulsoup to parse the data.
import requests
import zlib
from BeautifulSoup import BeautifulSoup
# first get the login page
response = requests.get('https://www.site.com')
# if content is zipped, then you'll need to unzip it
html = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
# parse the html for the csrf token
soup = BeautifulSoup(html)
csrf_token = soup.find(name='input', id='csrf_token')['value']
# now, submit the login data, including csrf token and the original cookie data
response = requests.post('https://www.site.com/login',
{'csrf_token': csrf_token,
'username': 'username',
'password': 'ckrit'},
cookies=response.cookies)
login_result = zlib.decompress(response.read(), 16+zlib.MAX_WBITS)
print login_result
I cannot say if GAE will allow any of this or not, but at least it might be helpful in figuring out what you may require in your particular case. Also, as Carl points out, if a submit input is used to trigger the post you'd have to include it. In my particular example, this isn't required.
You're missing the hidden submit=login argument. Have you tried:
import urllib2, urllib
url = 'http://blah.com/member/index.bv'
values = {'submit':'login',
'email' : 'someemail@gmail.com',
'password' : 'somepassword'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
精彩评论