开发者

How do I access the original response headers that contain a redirect when using urllib2.urlopen

I'm trying to parse the location header of an HTTP response that is returned after using urllib2.urlopen, but the only response headers that I receive are from the target redirect --- not the original response that contains the location header.

I have followed other questions on Stack Overflow that suggest to subclass the urllib2.HTTPRedirectHandler, but I'm still not able to understand how to access the original response that urlopen ends up following.

Here's an example of the problem:

import urllib2

req = urllib2.urlopen("http://wp.me")

print req.info()
开发者_Go百科

The output of print contains the response headers of the target of the redirected request. I would like to see the original.

Any help would be appreciated.


urllib2 does a transparent redirection, but as you said, you can subclass HTTPRedirectHandler and use that as an opener to get your required values.

import urllib2

class SmartRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp,
                                                                 code, msg,
                                                                 headers)
        result.status = code
        result.headers = headers
        return result

request = urllib2.Request("http://wp.me")
opener = urllib2.build_opener(SmartRedirectHandler())
obj = opener.open(request)
print 'The original headers where', obj.headers
print 'The Redirect Code was', obj.status

Any further attributes that you can set for your req in the SmartRedirectHandler, can be made available to you via the result.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜