开发者

Python: Counting URL in a specific class

I would like to count URLs in a specific class. the class which is

<h1 class="sectionTitle">INSIDERS AT LOEWS CORP (L)</h1>

have some links like

<a href="../../../research/stocks/people/relationship.asp?personId=228893&symbol=L:US">

I would like to count the number of this kind of links under only this class. This is my program, but when I wrote "count" it doesn't work.

i = 0
headings = bs.find('h1', text='INSIDERS AT LOEWS CORP (L)')
for section2 in headings.findNext(''):
    aa= section2.findAll('a', {'href': True})
    bb=aa.count('href')
    print bb
i = i + 1;

it doesn't work..... would you mind giving me a tip to solve the issue??? Thank you so much!

<h1 class="sectionTitle">INSIDERS AT LOEWS CORP (L)</h1>
<table cellpadding="0" cellspacing="0" class="table" width="100%" style="margin-bottom:5px;"><thead><tr><td>Name (Connections)</td><td colspan="2" style="width:120px;">Board Relationships</td><td>Title</td><td>Type of Board Member</td><td align="right">Age</td></tr></thead><tr><td><a href="../../../research/stocks/people/person.asp?personId=228893&symbol=L:US" class="link_xsb">Andrew Tisch  </a></td><td style="width:28px; padding-left: 5px;"><a href="../../../research/stocks/people/relationship.asp?personId=228893&symbol=L:US"><img src="../../images/icons/people2.gif" style="vertical-align:middle" / ></a></td><td> <strong><a href="/businessweek/research/stocks/people/relationship.asp?personId=228893&symbol=L:US">53</strong> Relationships</a></td><td style="width:200px">Co-Chairman, Member of the Office of the President, Chairman of Executive Committee, Member of Finance Committee and Chairman of Bulova</td><td >--</td><td align="right" style="width:20px">61</td></tr><tr><td><a href="../../../research/stocks/people/person.asp?personId=285942&symbol=L:US" class="link_xsb">Jonathan Tisch  </a></td><td style="width:28px; padding-left: 5px;"><a href="../../../research/stocks/people/relationship.asp?personId=285942&symbol=L:US"><img src="../../images/icons/people2.gif" style="vertical-align:middle" / ></a></td><td> <strong><a href="/businessweek/research/stocks/people/relationship.asp?personId=285942&symbol=L:US">56</strong> Relationships</a></td><td style="width:200px">Co-Chairman, Member of the Office of the President, Member of Executive Committee, Chairman of Loews Hotels and Chief Executive Officer of Loews Hotels</td><td >--</td><td align="right" style="width:20px">57</td></tr><tr><td><a href="../../../research/stocks/people/person.asp?personId=285936&symbol=L:US" class="link_xsb">James Tisch  </a></td><td style="width:28px; padding-left: 5px;"><a href="../../../research/stocks/people/relationship.asp?personId=285936&symbol=L:US"><img src="../../images/icons/people3.gif" style="vertical-align:middle" / &开发者_开发知识库gt;</a></td><td> <strong><a href="/businessweek/research/stocks/people/relationship.asp?personId=285936&symbol=L:US">240</strong> Relationships</a></td><td style="width:200px">Chief Executive Officer, President, Member of Office of the President, Director, Member of Executive Committee, Member of Finance Committee, Chairman of Diamond Offshore and Director of CNA</td><td >--</td><td align="right" style="width:20px">58</td></tr></table>


Being a big fan of jQuery I recommend PyQuery which offers a strong selector like jQuery.

from pyquery import PyQuery as pq
dom = pq(file('your.html').read())
print len(dom('h1.sectionTitle + table a'))

h1 is the element and . is for class name. You can use # for id name if the target is not class but id. + is for the next adjacent element. In this case, the next table element. Here I added table so it returns A elements inside of the table.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜