Downloading Javascript File from Website using Python
I am trying to use python to download the results from the following website:
http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY
I was attempting to use mechanize before I realized that the Download File is written in javascript which mechanize does not support. My code so far opens the web page as shown below. I am stuck on how to access the Download link on the web page in order to save the data onto my machine.
import urllib2
def downloadFile():
url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
t = urllib2.urlopen(url)
s = t.read()
print s
开发者_StackOverflow中文版The results that are printed are
<html>
<head></head>
<body>
<form name="apiForm" method="POST">
<input type="hidden" name="rowids">
<input type="hidden" name="annot">
<script type="text/javascript">
document.apiForm.rowids.value="4791928,3403495,...."; //There are really about 500 values
document.apiForm.annot.value="48";
document.apiForm.action = "chartReport.jsp";
document.apiForm.submit();
</script>
</form>
</body>
</html>
Does anybody know how I can select and move to the Download File page and save that file to my computer?
After some more research on that link, I came up with this. You can definitely use mechanize to do it.
import mechanize
def getJSVariableValue(content, variable):
value_start_index = content.find(variable)
value_start_index = content.find('"', value_start_index) + 1
value_end_index = content.find('"', value_start_index)
value = content[value_start_index:value_end_index]
return value
def getChartReport(url):
br = mechanize.Browser()
resp = br.open(url)
content = resp.read()
br.select_form(name = 'apiForm')
br.form.set_all_readonly(False)
br.form['rowids'] = getJSVariableValue(content, 'document.apiForm.rowids.value')
br.form['annot'] = getJSVariableValue(content, 'document.apiForm.annot.value')
br.form.action = 'http://david.abcc.ncifcrf.gov/' + getJSVariableValue(content, 'document.apiForm.action')
print br.form['rowids']
print br.form['annot']
br.submit()
resp = br.follow_link(text_regex=r'Download File')
content = resp.read()
f = open('output.txt', 'w')
f.write(content)
url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
chart_output = getChartReport(url)
精彩评论