how to download a file returned "indirectly"(??) from html form submission? (python, urllib, urllib2, etc.)

2023-04-09 04:55 问答作者：

EDIT: Problem solve. Ultimately it turned out to be a matter of "http:" instead of "https:" in the url (just a stupid mistake on my part). But it was the nice clean code example from cetver that helped me isolate the problem. Thanks to all who offered suggestions.

Putting this url in firefox triggers the appropriate download and save-as dialog:

https://www.virwox.com/orders.php?download_open=Download&format_open=.xls

The above link is same as submitting form with a "download" button of form on the page https://www.virwox.com/orders.php.

Here is the relevant html for the form that generates the above url:

<form action='orders.php' method='get'><fieldset><legend>Open Orders (2):</legend>
  <input type='submit' value='Download' name='download_open' /> 
  <select name='format_open'>
    <option value='.xls'>.xls</option>
    <option value='.csv'>.csv</option>
    <option value='.xml'>.xml</option></select>
</form>

But when I try the following python code (which I sort of expected would not work)...

# get orders list
openOrders_url = virwoxTopLevel_url+"/orders.php"
openOrders_params = urlencode( { "download_open":"Download", "format_open":".xml" } )
openOrders_request = urllib2.Request(openOrders_url,openOrders_params,headers)
openOrders_response = virwox_opener.open(openOrders_request)
openOrders_xml = openOrders_response.read()
print(openOrders_xml)

...openOrders_xml ends up just being the original page (https://www.virwox.com/orders.php).

How does firefox know there is also a file to be downloaded, and how do I detect and download this file in Python?

Please note that this is not a security/login issue, as I would not even be able to get the orders.php page if I was having authentication trouble.

EDIT: I am wondering if this has something to do with redirection (I am using the basic redirection handler) or maybe I should be using something liek urllib.fileretrieve().

EDIT: here is code for complete program, just in case is relevant...

import urllib
import urllib2
import cookielib
import pprint

from urllib import urlencode

username=###############
password=###############

virwoxTopLevel_url = "http://www.virwox.com/"

overview_url = "https://www.virwox.com/index.php"


# Header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }

# Handlers...
# cookie handler...
cookie_handler= urllib2.HTTPCookieProcessor( cookielib.CookieJar() )
# redirect handler...
redirect_handler= urllib2.HTTPRedirectHandler()

# create "opener" (OpenerDirector instance)
virwox_opener = urllib2.build_opener(redirect_handler,cookie_handler)

# login
login_url = "https://www.virwox.com/index.php"
values = { 'uname' : username, 'password' : password }
login_data = urllib.urlencod开发者_Python百科e(values)
login_request = urllib2.Request(login_url,login_data,headers)
login_response = virwox_opener.open(login_request)

overview_html = login_response.read();

virwox_json_url = "http://api.virwox.com/api/json.php"
getTest = urllib.urlencode( { "method":"getMarketDepth", "symbols[0]":"EUR/SLL","symbols[1]":"USD/SLL","buyDepth":1,"sellDepth":1,"id":1 } )
get_response = urllib2.urlopen(virwox_json_url,getTest)
#print get_response.read()

# get orders list
openOrders_url = virwoxTopLevel_url+"/orders.php"
openOrders_params = urlencode( { "download_open":"Download", "format_open":".xml" } )
openOrders_request = urllib2.Request(openOrders_url,openOrders_params,headers)
openOrders_response = virwox_opener.open(openOrders_request)
openOrders_xml = openOrders_response.read()

# the following prints the html of the /orders.php page not the desired download data:
print "******************************************"
print(openOrders_xml)

print "******************************************"
print openOrders_response.info()
print openOrders_response.geturl()
print "******************************************"
# the following prints nothing, i assume because without the cookie handler, fails to authenticate
#  (note that authentication is by the php program, not html authentication, so no "authentication hangler" above
print urllib2.urlopen("https://www.virwox.com/orders.php?download_open=Download&format_open=.xml").read()

CODE BELLOW ISN'T TESTED

something like:

import urllib, urllib2,

HOST = 'https://www.virwox.com'
FORMS = {
    'login': {
        'action': HOST + '/index.php',
        'data': urllib.urlencode( {
            'uname':'username', 
            'password':'******'
        } )
    },
    'orders': {
        'action': HOST + '/orders.php',
        'data': urllib.urlencode( {
            'download_open':'Download', 
            'format_open':'.xml'
        } )
    }
}

opener = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
try:
    req = urllib2.Request( url = FORMS['login']['action'], data = FORMS['login']['data'] )
    opener.open( req ) #save login cookie
    print 'Login: OK'
except Exception, e:
    print 'Login: Fail'
    print e   
try:
    req = urllib2.Request( url = FORMS['orders']['action'], data = FORMS['orders']['data'] )
    print 'Orders Page: OK'
except Exception, e:
    print 'Orders Page: Fail'
    print e
try:
    xml = opener.open( req ).read()
    print xml
except Exception, e:
    print 'Obtain XML: Fail'
    print e

Looks like your question is already answered, but you might like to take a look at the Requests package. It is basically a nice wrapper around the standard lib tools. The following (probably) does what you want.

import requests

r = requests.get('http://www.virwox.com/orders.php', 
    allow_redirects=True,
    auth=('user', 'pass'), 
    data={'download_open': 'Download', 'format_open': '.xls'})

print r.content

You may need a urllib2.HTTPPasswordMgr like this (untested since I dont have your uname/pw):

import urllib
import urllib2

uri = "http://www.virwox.com/"
url = uri + "orders.php"
uname = "USERNAME"
password = "PASSWORD"
post = urllib.urlencode({"download_open":"Download", "format_open":".xls"})
pwMgr = urllib2.HTTPPasswordMgr()
pwMgr.add_password(realm=None, uri=uri, user=uname, passwd=password)
urllib2.install_opener(urllib2.build_opener(urllib2.HTTPDigestAuthHandler(pwMgr)))
req = urllib2.Request(url, post)
s = urllib2.urlopen(req)
cookie = s.headers['Set-Cookie']
s.close()

req.add_header('Cookie', cookie)

s = urllib2.urlopen(req)
source = s.read()
s.close()

Then, you can:

print source

to see if it contains the xml data you need.

继续阅读：download forms python urllib urllib2

how to download a file returned "indirectly"(??) from html form submission? (python, urllib, urllib2, etc.)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？