开发者

Python - Error Parsing HTML w/ BeautifulSoup

I'm developing an app that inputs data onto a webpage shopping cart and verifies the total. That works fine, however, I am having issue with parsing the HTML output.

A previous discussion; retrieving essential data from a webpage using python, recommended using BeautifulSoup to make solve said user's problem.

I've borrowed some of the python code, and got it to work on a MacOS system. However when I copied the code over to an ubuntu installation, I'm seeing a strange error.

**The Code (where I'm seeing the issue):

response = opener.open(req)
html = response.read()
doc = BeautifulSoup.BeautifulSoup(html)

table = doc.find('tr', {'id':'carttablerow0'})

dump = [cell.getText().strip() for cell in table.findAll('td')]

print "\n Catalog Number: %s \n Description: %s \n Price: %s\n" %(dump[0], dump[1], dump[5])

**The Error ( on the ubuntu server)

    Traceback (most recent call last):
      File "./shopping_cart_checker.py", line 49, in <module>
        dump = [cell.getText().strip() for cell in table.findAll('td')]
    TypeError: 'NoneType' object is not callable

I think I've narrowed it down to getText() being the culprit. But I'm not certain why this works on MacOS and not ubuntu.

Any suggestions?

Thank you.

@@@@@@@@@@@@@@@@@@@@@@@@@

Hi Guys,

Thank you for the various suggestions. I've attempted most of them, (incorporating the "if cell" statement into the code, however it still i开发者_JAVA百科sn't working.

@ Ignacio Vazquez-Abrams -- Here's a copy of the HTML I'm attempting to strip:

http://pastebin.com/WdaeExnC


As to why it doesn't work on Ubutntu, no idea. However, you can try this:

dump = [(cell.getText() if cell.getText() else '').strip() for cell in table.findAll('td')]


It doesn't seem to be a problem with the code but with the HTML you are reading, what I would do is changing your code to do this:

dump = [cell.getText().strip() for cell in table.findAll('td') if cell]

That way if cell is None it will not try to execute getText and just skip that cell. You should debug if you can, i recommend you to use pdb or ipdb (the one i like to use). Here is a tutorial, with that you can stop just before the line and print values, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜