How do I catch all possible errors with url fetch (python)?

2023-02-15 05:45 问答作者：

In my application users enter a url and I try to open the link and get the title of the page. But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError and IOError. I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. How do I catch all possible errors? This is the code I have now:

    title = "title"

    try:

        soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
        title = str(soup.html.head.title.string)

        if title == "404 Not Found":
            self.redirect("/urlparseerror")
        elif title == "403 - Forbidden":
            self.redirect("/urlparseerror")     
        else:
            title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

    except UnicodeDecodeError:    
        self.redirect("/urlparseerror?error=UnicodeDecodeError")

    except AttributeError:        
        self.redirect("/urlparseerror?error=AttributeError")

    #https url:    
    except IOError:        
        self.redirect("/urlparseerror?error=IOError")


    #I tried this else clause to catch any other error
    #but it does not work
    #this i开发者_如何学Gos executed when none of the errors above is true:
    #
    #else:
    #    self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")

UPDATE

As suggested by @Wooble in the comments I added try...except while writing the title to database:

        try:
            new_item = Main(
                        ....
                        title = unicode(title, "utf-8"))

            new_item.put()

        except UnicodeDecodeError:    

            self.redirect("/urlparseerror?error=UnicodeDecodeError")

This works. Although the out-of-range character â€”is still in title according to the logging info:

***title: 7.2. re â€” Regular expression operations &mdash; Python v2.7.1 documentation**

Do you know why?

You can use except without specifying any type to catch all exceptions.

From the python docs http://docs.python.org/tutorial/errors.html:

import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

The last except will catch any exception that has not been caught before (i.e. a Exception which is not of IOError or ValueError.)

You can use the top level exception type Exception, which will catch any exception that has not been caught before.

http://docs.python.org/library/exceptions.html#exception-hierarchy

try:

    soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
    title = str(soup.html.head.title.string)

    if title == "404 Not Found":
        self.redirect("/urlparseerror")
    elif title == "403 - Forbidden":
        self.redirect("/urlparseerror")     
    else:
        title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

except UnicodeDecodeError:    
    self.redirect("/urlparseerror?error=UnicodeDecodeError")

except AttributeError:        
    self.redirect("/urlparseerror?error=AttributeError")

#https url:    
except IOError:        
    self.redirect("/urlparseerror?error=IOError")

except Exception, ex:
    print "Exception caught: %s" % ex.__class__.__name__

继续阅读：google-app-engine urlfetch

How do I catch all possible errors with url fetch (python)?

更多精彩内容

精彩评论

最新问答

宫颈癌术后可以性生活吗？

决战平安京人面树赏金特典皮肤什么时候上线?？

CF2024宠粉节活动入口在哪?？

原神养石任务怎么做?？

射戮骑士什么时候发售?？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？