Python urllib2 automatic form filling and retrieval of results
I'm looking to be able to query a site for warranty information on a machine that this script would be running on. It should be able to fill out a form if needed ( like in the case of say HP's service site) and would then be able to retrieve the resulting web page.
I already have the bits in place to parse the resulting html that is reported back I'm just having trouble with what needs to be done in order to do 开发者_JAVA百科a POST of data that needs to be put in the fields and then being able to retrieve the resulting page.
If you absolutely need to use urllib2, the basic gist is this:
import urllib
import urllib2
url = 'http://whatever.foo/form.html'
form_data = {'field1': 'value1', 'field2': 'value2'}
params = urllib.urlencode(form_data)
response = urllib2.urlopen(url, params)
data = response.read()
If you send along POST data (the 2nd argument to urlopen()
), the request method is automatically set to POST.
I suggest you do yourself a favor and use mechanize, a full-blown urllib2 replacement that acts exactly like a real browser. A lot of sites use hidden fields, cookies, and redirects, none of which urllib2 handles for you by default, where mechanize does.
Check out Emulating a browser in Python with mechanize for a good example.
Using urllib and urllib2 together,
data = urllib.urlencode([('field1',val1), ('field2',val2)]) # list of two-element tuples
content = urllib2.urlopen('post-url', data)
content will give you the page source.
I’ve only done a little bit of this, but:
- You’ve got the HTML of the form page. Extract the
name
attribute for each form field you need to fill in. - Create a dictionary mapping the names of each form field with the values you want submit.
- Use
urllib.urlencode
to turn the dictionary into the body of your post request. - Include this encoded data as the second argument to
urllib2.Request()
, after the URL that the form should be submitted to.
The server will either return a resulting web page, or return a redirect to a resulting web page. If it does the latter, you’ll need to issue a GET
request to the URL specified in the redirect response.
I hope that makes some sort of sense?
精彩评论