Mechanize not recognizing anchor tags via CSS selector methods
(Hope this isn't a breach of etiquette: I posted this on RailsForum, but I haven't been getting much response from there recently.)
Has anyone else had problems with Mechanize not recognizing anchor tags via CSS selectors?
The HTML looks like this (snippet with white space removed for clarity):
<td class='calendarCell' align='left'>
<a href="http://www.mysite.org/index.php/site/ActivitiesCalendar/2010/02/10/">10</a>
<p style="margin-bottom:15px; line-height:14px; text-align:left;">
<span class="sidenavHeadType">
Current Events</span><br />
<b><a href="http://www.mysite.org/index.php/site/
Clubs/banks_and_the_fed" class="a2">Banks and the Fed</a></b>
<br />
10:30am- 11:45am
</p>
I'm trying to collect the data from these events. Everything is working except getting the anchor within the <p>
. There's clearly an <a>
tag inside the <b>
, and I'm going to need to follow that link to get further details on this event.
In my rake task, I have:
agent.page.search(".calendarCell,.calendarToday").each do |item|
day = item.at("a").text
item.search("p").each do |e|
anchor = e.at("a")
puts anchor
puts e.inner_html
end
end
What's interesting is that the item.开发者_如何转开发at("a") always returns the anchor. But the e.at("a") returns nil. And when I do inner_html on the p element, it ignores the anchor entirely. Example output:
nil
<span class="sidenavHeadType">
Photo Club</span><br><b>Indexing Slide Collections</b>
<br>
2:00pm- 3:00pm
However, when I run the same scrape directly with Nokogiri:
doc.css(".calendarCell,.calendarToday").each do |item|
day = item.at_css("a").text
item.css("p").each do |e|
link = e.at_css("a")[:href]
puts e.inner_html
end
end
It recognizes the inside the
, and it will return the href, etc.
<span class="sidenavHeadType">
Bridge Party</span><br><b><a href="http://www.mysite.org/index.php/site/Clubs/party_bridge_51209" class="a2">Party Bridge</a></b>
<br>
7:00pm- 9:00pm
Mechanize is supposed to use Nokogiri, so I'm wondering if I have a bad version or if this affects others as well.
Thanks for any leads.
Never mind. False alarm. In my Nokogiri task, I was pointing to a local copy of the page that included the anchors. The live page required a login, so when I browsed to it, I could see the a tags. Adding the login to the rake task solved it.
精彩评论