Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?

2023-03-27 06:16 问答作者：

According to the urllib2 documentation,

Because the default handlers handle redirects (codes in the 300 range), and codes in the 100-299 range indicate success, you will usually only see error codes in the 400-599 range.

And yet the following code

request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)

raises an HTTPError with code 201 (created):

ERROR    2011-08-11 20:40:17,318 __init__.py:463] HTTP Error 201: Created

So why is urllib2 throwing HTTPErrors on this successful request?

It's not too much of a pain; I can easily extend the code to:

try:
    request = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(request)
except HTTPError, e:
    if e.code == 201:
        # success! :)
    else:
        # fail! :(
else:
    # when will this happen...?

But this doesn't seem like the intended behavior, based on the documentation and the fact that I can't find similar questions about this odd behavior.

Also,开发者_如何学JAVA what should the else block be expecting? If successful status codes are all interpreted as HTTPErrors, then when does urllib2.urlopen() just return a normal file-like response object like all the urllib2 documentation refers to?

You can write a custom Handler class for use with urllib2 to prevent specific error codes from being raised as HTTError. Here's one I've used before:

class BetterHTTPErrorProcessor(urllib2.BaseHandler):
    # a substitute/supplement to urllib2.HTTPErrorProcessor
    # that doesn't raise exceptions on status codes 201,204,206
    def http_error_201(self, request, response, code, msg, hdrs):
        return response
    def http_error_204(self, request, response, code, msg, hdrs):
        return response
    def http_error_206(self, request, response, code, msg, hdrs):
        return response

Then you can use it like:

opener = urllib2.build_opener(self.BetterHTTPErrorProcessor)
urllib2.install_opener(opener)

req = urllib2.Request(url, data, headers)
urllib2.urlopen(req)

As the actual library documentation mentions:

For 200 error codes, the response object is returned immediately.

For non-200 error codes, this simply passes the job on to the protocol_error_code handler methods, via OpenerDirector.error(). Eventually, urllib2.HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.

http://docs.python.org/library/urllib2.html#httperrorprocessor-objects

I personally think it was a mistake and very nonintuitive for this to be the default behavior. It's true that non-2XX codes imply a protocol level error, but turning that into an exception is too far (in my opinion at least).

In any case, I think the most elegant way to avoid this is:

opener = urllib.request.build_opener()
for processor in opener.process_response['https']: # or http, depending on what you're using
   if isinstance(processor, urllib.request.HTTPErrorProcessor): # HTTPErrorProcessor also for https
       opener.process_response['https'].remove(processor)
       break # there's only one such handler by default
response = opener.open('https://www.google.com')

Now you have the response object. You can check it's status code, headers, body, etc.

继续阅读：http-status-codes python urllib2

Why does Python's urllib2.urlopen() raise an HTTPError for successful status codes?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？