Beautiful Soup line matching
Im trying to build a html table that only contains the table header and the row that is relevant to me. The site I'm using is http://wolk.vlan77.be/~gerben.
I'm trying to get the the table header and my the table entry so I do not have to look each time for my own name.
What I want to do :
- get the html page
- Parse it to get the header of the table
- Parse it to get the line with table tags relevant to me (so the table row containing lucas)
- Build a html page that shows the header and table entry relevant to me
What I am doing now :
- get the header with beautifulsoup first
- get my entry
- add both to an array
pass this array to a method that generates a string that can be printed as html page
def downloadURL(self): global input filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben') input = '' for line in filehandle.readlines(): input += line filehandle.close()
def soupParserToTable(self,input): global header soup = self.BeautifulSoup(input) header = soup.first('tr') 开发者_JAVA技巧 tableInput='0' table = soup.findAll('tr') for line in table: print line print '\n \n' if '''lucas''' in line: print 'true' else: print 'false' print '\n \n **************** \n \n'
I want to get the line from the html file that contains lucas, however when I run it like this I get this in my output :
****************
<tr><td>lucas.vlan77.be</td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> </tr>
false
Now I don't get why it doesn't match, the string lucas is clearly in there :/ ?
It looks like you're over-complicating this.
Here's a simpler version...
>>> import BeautifulSoup
>>> import urllib2
>>> html = urllib2.urlopen('http://wolk.vlan77.be/~gerben')
>>> soup = BeautifulSoup.BeautifulSoup(html)
>>> print soup.find('td', text=lambda data: data.string and 'lucas' in data.string)
lucas.vlan77.be
It's because line is not a string, but BeautifulSoup.Tag instance. Try to get td value instead:
if '''lucas''' in line.td.string:
精彩评论