开发者

python beautifulsoup adding extra end tags

I'm using Beautifulsoup to parse a website

开发者_开发知识库
  request = urllib2.Request(url)
  response = urllib2.urlopen(request)
  soup = BeautifulSoup.BeautifulSoup(response)

I am using it to traverse a table. The problem I am running into is that BS is adding an extra end tag for the table into the html which doesn't exist, which I verified with: print soup.prettify(). So, one of the td tags is getting left out of the table and I can't select it.


How about searching directly for each tag instead of trying to traverse into the table?

   for td in soup.find("td"):
        ...

its not unusual to find the tbody tag nested within a table automatically when its not in the code. Either you can code for it or just jump straight to the tr or td tag.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜