How to get the true URL of a file on the web. (Python)
I notice that sometimes audio files on the internet have a "fake" URL.
http://garagaeband.com/3252243
And this will 302 to the real URL:
http://garageband.com/michael_jackson4.mp3
My question is...when supplied with the fake URL, how can you get the REAL URL from headers?
Currently, this is my code for reading the headers of a file. I don't know if this code will get me what I want to accomplish. How do I parse out the "real" URL From the response headers?
import httplib
conn = httplib.HTTPConnection(hea开发者_开发百科d)
conn.request("HEAD",tail)
res = conn.getresponse()
This has a 302 redirect: http://www.garageband.com/mp3cat/.UZCMYiqF7Kum/01_No_pierdas_la_fuente_del_gozo.mp3
Use urllib.getUrl()
edit: Sorry, I haven't done this in a while:
import urllib
urllib.urlopen(url).geturl()
For example:
>>> f = urllib2.urlopen("http://tinyurl.com/oex2e")
>>> f.geturl()
'http://www.amazon.com/All-Creatures-Great-Small-Collection/dp/B00006G8FI'
>>>
Mark Pilgrim advises to use httplib2 in "Dive Into Python3" as it handles many things (including redirects) in a smarter way.
>>> import httplib2
>>> h = httplib2.Http()
>>> response, content = h.request("http://garagaeband.com/3252243")
>>> response["content-location"]
"http://garageband.com/michael_jackson4.mp3"
You have to read the response, realize that you got a 302 (FOUND), and parse out the real URL from the response headers, then fetch the resource using the new URI.
I solved the answer.
import urllib2
req = urllib2.Request('http://' + theurl)
opener = urllib2.build_opener()
f = opener.open(req)
print 'the real url is......' + f .url
精彩评论