How do I catch all possible errors with url fetch (python)?
In my application users enter a url and I try to open the link and get the title of the page. But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError
and IOError
. I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. How do I catch all possible errors? This is the code I have now:
title = "title"
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
#I tried this else clause to catch any other error
#but it does not work
#this i开发者_如何学Gos executed when none of the errors above is true:
#
#else:
# self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")
UPDATE
As suggested by @Wooble in the comments I added try...except
while writing the title
to database:
try:
new_item = Main(
....
title = unicode(title, "utf-8"))
new_item.put()
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
This works. Although the out-of-range character —
is still in title
according to the logging info:
***title: 7.2. re — Regular expression operations — Python v2.7.1 documentation**
Do you know why?
You can use except without specifying any type to catch all exceptions.
From the python docs http://docs.python.org/tutorial/errors.html:
import sys
try:
f = open('myfile.txt')
s = f.readline()
i = int(s.strip())
except IOError as (errno, strerror):
print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
print "Could not convert data to an integer."
except:
print "Unexpected error:", sys.exc_info()[0]
raise
The last except will catch any exception that has not been caught before (i.e. a Exception which is not of IOError or ValueError.)
You can use the top level exception type Exception, which will catch any exception that has not been caught before.
http://docs.python.org/library/exceptions.html#exception-hierarchy
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
except Exception, ex:
print "Exception caught: %s" % ex.__class__.__name__
精彩评论