开发者

Determining redirected URL in Python

I made a little parser using HTMLparser and I would like to know where a link is redirected. I don't开发者_高级运维 know how to explain this, so please look this example:

On my page I have a link on the source: http://www.myweb.com?out=147, which redirects to http://www.mylink.com. I can parse http://www.myweb.com?out=147 without any problems, but I don't know how to get http://www.mylink.com.


You can use urllib2 (urllib.request in Python 3) and its HTTPRedirectHandler in order to find out where a URL will redirect you. Here's a function that does that:

import urllib2

def get_redirected_url(url):
    opener = urllib2.build_opener(urllib2.HTTPRedirectHandler)
    request = opener.open(url)
    return request.url

print get_redirected_url("http://google.com/")
# prints "http://www.google.com/"


You can not get hold of the redirection URL through parsing the HTML source code. Redirections are triggered by the server and NOT by the client. You need to perform a HTTP request to the related URL and check the HTTP response of the server - in particular for the HTTP status code 304 (Redirection) and the new URL.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜