Downloading Files with Python Urllib, Urllib2
I am trying to download files from a website using urllib as described in this thr开发者_JAVA技巧ead: link text
import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
I am able to download the files (mostly pdf) but all I get is corrupted files that cannot open. I suspect it's because the website requires a login.
How can the above function be modified to handle cookies? I already know the names of the form fields that carry the username & password information. When I print the return values of urlretrieve I get messages like:
a, b = urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
print a, b
>> **cache-control:** no-cache, no-store, must-revalidate, s-maxage=300, proxy-revalida
te
>> **connection:** close
I am able to manually download the files if I enter their urls in the browser. Thanks
First urllib2 actually supports cookies and cookie handling should be easy, second of all you can check what kind of file you have downloaded. E.g. AFAIK all mp3 starts with the bytes "ID3"
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")
I might be possible that the server you requesting to is looking for certain header messages, such as User-Agent. You may try mimicking a browser behavior by sending additional headers.
精彩评论