Python - Error Parsing HTML w/ BeautifulSoup
I'm developing an app that inputs data onto a webpage shopping cart and verifies the total. That works fine, however, I am having issue with parsing the HTML output.
A previous discussion; retrieving essential data from a webpage using python, recommended using BeautifulSoup to make solve said user's problem.
I've borrowed some of the python code, and got it to work on a MacOS system. However when I copied the code over to an ubuntu installation, I'm seeing a strange error.
**The Code (where I'm seeing the issue):
response = opener.open(req)
html = response.read()
doc = BeautifulSoup.BeautifulSoup(html)
table = doc.find('tr', {'id':'carttablerow0'})
dump = [cell.getText().strip() for cell in table.findAll('td')]
print "\n Catalog Number: %s \n Description: %s \n Price: %s\n" %(dump[0], dump[1], dump[5])
**The Error ( on the ubuntu server)
Traceback (most recent call last):
File "./shopping_cart_checker.py", line 49, in <module>
dump = [cell.getText().strip() for cell in table.findAll('td')]
TypeError: 'NoneType' object is not callable
I think I've narrowed it down to getText() being the culprit. But I'm not certain why this works on MacOS and not ubuntu.
Any suggestions?
Thank you.
@@@@@@@@@@@@@@@@@@@@@@@@@
Hi Guys,
Thank you for the various suggestions. I've attempted most of them, (incorporating the "if cell" statement into the code, however it still i开发者_JAVA百科sn't working.
@ Ignacio Vazquez-Abrams -- Here's a copy of the HTML I'm attempting to strip:
http://pastebin.com/WdaeExnC
As to why it doesn't work on Ubutntu, no idea. However, you can try this:
dump = [(cell.getText() if cell.getText() else '').strip() for cell in table.findAll('td')]
It doesn't seem to be a problem with the code but with the HTML you are reading, what I would do is changing your code to do this:
dump = [cell.getText().strip() for cell in table.findAll('td') if cell]
That way if cell is None it will not try to execute getText and just skip that cell. You should debug if you can, i recommend you to use pdb or ipdb (the one i like to use). Here is a tutorial, with that you can stop just before the line and print values, etc.
精彩评论