开发者

how to download a file returned "indirectly"(??) from html form submission? (python, urllib, urllib2, etc.)

EDIT: Problem solve. Ultimately it turned out to be a matter of "http:" instead of "https:" in the url (just a stupid mistake on my part). But it was the nice clean code example from cetver that helped me isolate the problem. Thanks to all who offered suggestions.

Putting this url in firefox triggers the appropriate download and save-as dialog:

https://www.virwox.com/orders.php?download_open=Download&format_open=.xls

The above link is same as submitting form with a "download" button of form on the page https://www.virwox.com/orders.php.

Here is the relevant html for the form that generates the above url:

<form action='orders.php' method='get'><fieldset><legend>Open Orders (2):</legend>
  <input type='submit' value='Download' name='download_open' /> 
  <select name='format_open'>
    <option value='.xls'>.xls</option>
    <option value='.csv'>.csv</option>
    <option value='.xml'>.xml</option></select>
</form>

But when I try the following python code (which I sort of expected would not work)...

# get orders list
openOrders_url = virwoxTopLevel_url+"/orders.php"
openOrders_params = urlencode( { "download_open":"Download", "format_open":".xml" } )
openOrders_request = urllib2.Request(openOrders_url,openOrders_params,headers)
openOrders_response = virwox_opener.open(openOrders_request)
openOrders_xml = openOrders_response.read()
print(openOrders_xml)

...openOrders_xml ends up just being the original page (https://www.virwox.com/orders.php).

How does firefox know there is also a file to be downloaded, and how do I detect and download this file in Python?

Please note that this is not a security/login issue, as I would not even be able to get the orders.php page if I was having authentication trouble.

EDIT: I am wondering if this has something to do with redirection (I am using the basic redirection handler) or maybe I should be using something liek urllib.fileretrieve().

EDIT: here is code for complete program, just in case is relevant...

import urllib
import urllib2
import cookielib
import pprint

from urllib import urlencode

username=###############
password=###############

virwoxTopLevel_url = "http://www.virwox.com/"

overview_url = "https://www.virwox.com/index.php"


# Header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }

# Handlers...
# cookie handler...
cookie_handler= urllib2.HTTPCookieProcessor( cookielib.CookieJar() )
# redirect handler...
redirect_handler= urllib2.HTTPRedirectHandler()

# create "opener" (OpenerDirector instance)
virwox_opener = urllib2.build_opener(redirect_handler,cookie_handler)

# login
login_url = "https://www.virwox.com/index.php"
values = { 'uname' : username, 'password' : password }
login_data = urllib.urlencod开发者_Python百科e(values)
login_request = urllib2.Request(login_url,login_data,headers)
login_response = virwox_opener.open(login_request)

overview_html = login_response.read();

virwox_json_url = "http://api.virwox.com/api/json.php"
getTest = urllib.urlencode( { "method":"getMarketDepth", "symbols[0]":"EUR/SLL","symbols[1]":"USD/SLL","buyDepth":1,"sellDepth":1,"id":1 } )
get_response = urllib2.urlopen(virwox_json_url,getTest)
#print get_response.read()

# get orders list
openOrders_url = virwoxTopLevel_url+"/orders.php"
openOrders_params = urlencode( { "download_open":"Download", "format_open":".xml" } )
openOrders_request = urllib2.Request(openOrders_url,openOrders_params,headers)
openOrders_response = virwox_opener.open(openOrders_request)
openOrders_xml = openOrders_response.read()

# the following prints the html of the /orders.php page not the desired download data:
print "******************************************"
print(openOrders_xml)

print "******************************************"
print openOrders_response.info()
print openOrders_response.geturl()
print "******************************************"
# the following prints nothing, i assume because without the cookie handler, fails to authenticate
#  (note that authentication is by the php program, not html authentication, so no "authentication hangler" above
print urllib2.urlopen("https://www.virwox.com/orders.php?download_open=Download&format_open=.xml").read()


CODE BELLOW ISN'T TESTED

something like:

import urllib, urllib2,

HOST = 'https://www.virwox.com'
FORMS = {
    'login': {
        'action': HOST + '/index.php',
        'data': urllib.urlencode( {
            'uname':'username', 
            'password':'******'
        } )
    },
    'orders': {
        'action': HOST + '/orders.php',
        'data': urllib.urlencode( {
            'download_open':'Download', 
            'format_open':'.xml'
        } )
    }
}

opener = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
try:
    req = urllib2.Request( url = FORMS['login']['action'], data = FORMS['login']['data'] )
    opener.open( req ) #save login cookie
    print 'Login: OK'
except Exception, e:
    print 'Login: Fail'
    print e   
try:
    req = urllib2.Request( url = FORMS['orders']['action'], data = FORMS['orders']['data'] )
    print 'Orders Page: OK'
except Exception, e:
    print 'Orders Page: Fail'
    print e
try:
    xml = opener.open( req ).read()
    print xml
except Exception, e:
    print 'Obtain XML: Fail'
    print e


Looks like your question is already answered, but you might like to take a look at the Requests package. It is basically a nice wrapper around the standard lib tools. The following (probably) does what you want.

import requests

r = requests.get('http://www.virwox.com/orders.php', 
    allow_redirects=True,
    auth=('user', 'pass'), 
    data={'download_open': 'Download', 'format_open': '.xls'})

print r.content


You may need a urllib2.HTTPPasswordMgr like this (untested since I dont have your uname/pw):

import urllib
import urllib2

uri = "http://www.virwox.com/"
url = uri + "orders.php"
uname = "USERNAME"
password = "PASSWORD"
post = urllib.urlencode({"download_open":"Download", "format_open":".xls"})
pwMgr = urllib2.HTTPPasswordMgr()
pwMgr.add_password(realm=None, uri=uri, user=uname, passwd=password)
urllib2.install_opener(urllib2.build_opener(urllib2.HTTPDigestAuthHandler(pwMgr)))
req = urllib2.Request(url, post)
s = urllib2.urlopen(req)
cookie = s.headers['Set-Cookie']
s.close()

req.add_header('Cookie', cookie)

s = urllib2.urlopen(req)
source = s.read()
s.close()

Then, you can:

print source

to see if it contains the xml data you need.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜