开发者

Beautiful Soup line matching

Im trying to build a html table that only contains the table header and the row that is relevant to me. The site I'm using is http://wolk.vlan77.be/~gerben.

I'm trying to get the the table header and my the table entry so I do not have to look each time for my own name.

What I want to do :

  • get the html page
  • Parse it to get the header of the table
  • Parse it to get the line with table tags relevant to me (so the table row containing lucas)
  • Build a html page that shows the header and table entry relevant to me

What I am doing now :

  • get the header with beautifulsoup first
  • get my entry
  • add both to an array
  • pass this array to a method that generates a string that can be printed as html page

    def downloadURL(self): global input filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben') input = '' for line in filehandle.readlines(): input += line filehandle.close()

    def soupParserToTable(self,input):
        global header
    
        soup = self.BeautifulSoup(input)
        header = soup.first('tr')
      开发者_JAVA技巧  tableInput='0'
    
        table = soup.findAll('tr')
        for line in table:
            print line
            print '\n \n'
            if '''lucas''' in line:
                print 'true'
            else:
                print 'false'
            print '\n \n **************** \n \n'
    

I want to get the line from the html file that contains lucas, however when I run it like this I get this in my output :

 **************** 


<tr><td>lucas.vlan77.be</td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> </tr>



false

Now I don't get why it doesn't match, the string lucas is clearly in there :/ ?


It looks like you're over-complicating this.

Here's a simpler version...

>>> import BeautifulSoup
>>> import urllib2
>>> html = urllib2.urlopen('http://wolk.vlan77.be/~gerben')
>>> soup = BeautifulSoup.BeautifulSoup(html)
>>> print soup.find('td', text=lambda data: data.string and 'lucas' in data.string)
lucas.vlan77.be


It's because line is not a string, but BeautifulSoup.Tag instance. Try to get td value instead:

if '''lucas''' in line.td.string:
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜