开发者

How to get the true URL of a file on the web. (Python)

I notice that sometimes audio files on the internet have a "fake" URL.

http://garagaeband.com/3252243

And this will 302 to the real URL:

http://garageband.com/michael_jackson4.mp3

My question is...when supplied with the fake URL, how can you get the REAL URL from headers?

Currently, this is my code for reading the headers of a file. I don't know if this code will get me what I want to accomplish. How do I parse out the "real" URL From the response headers?

import httplib
conn = httplib.HTTPConnection(hea开发者_开发百科d)
conn.request("HEAD",tail)
res = conn.getresponse()

This has a 302 redirect: http://www.garageband.com/mp3cat/.UZCMYiqF7Kum/01_No_pierdas_la_fuente_del_gozo.mp3


Use urllib.getUrl()

edit: Sorry, I haven't done this in a while:

import urllib
urllib.urlopen(url).geturl()

For example:

>>> f = urllib2.urlopen("http://tinyurl.com/oex2e")
>>> f.geturl()
'http://www.amazon.com/All-Creatures-Great-Small-Collection/dp/B00006G8FI'
>>> 


Mark Pilgrim advises to use httplib2 in "Dive Into Python3" as it handles many things (including redirects) in a smarter way.

>>> import httplib2
>>> h = httplib2.Http()
>>> response, content = h.request("http://garagaeband.com/3252243")
>>> response["content-location"]
    "http://garageband.com/michael_jackson4.mp3"


You have to read the response, realize that you got a 302 (FOUND), and parse out the real URL from the response headers, then fetch the resource using the new URI.


I solved the answer.

 import urllib2
    req = urllib2.Request('http://' + theurl)
    opener = urllib2.build_opener()
    f = opener.open(req)
    print 'the real url is......' + f .url
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜