Parsing HTML in Google App Engine in python with BeautifulSoup?
I've been using BeautifulSoup to parse HTML from several sites adding each site to the GAE task queue. However the task queue seems to repeat 2 tasks which seem to either generate ApplicationError: 5 error's in the log or fails with a 'NoneType' object has no attribute 'findAll', which when I tested it on IDLE generated None objects when beautiful soup failed to find anything in the page I passed it. I added the code below however this doesn't appear to solve the problem:
productTable = soup.find('table')
if productTable == None:
logging.error('Could not find the product table')
break
if productTable.findAll('table') == None:
logging.error('Product table was empty')
break
I'm wondering if anyone could give me some suggestions as to wha开发者_如何学Pythont is wrong so I can fix it.
The application error probably indicates that your urlfetch to retrieve the HTML has failed. The task queue will automatically retry the task until it succeeds (if used with the default settings). I wouldn't worry too much about this error if it only occurs once in a while and goes away after being retried. If a given task fails repeatedly, then I'd suspect there is some problem with the resource you are trying to fetch.
If you first check that productTable
is not None
before using it, then you should not get the 'NoneType' object has no attribute 'findAll'
error. It seems like your check failing doesn't cause your productTable.findall
call to be bypassed.
精彩评论