Ruby - Mechanize: Select link by classname and other questions
At the moment I'm having a look on Mechanize. I am pretty new to Ruby, so please be patient.
I wrote a little test script:
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
page = agent.get('http://www.google.de')
pp page.title
google_form = page.form_with(:name => 'f')
google_form.q = 'test'
page = agent.submit(google_form)
pp page.title
page_links = Array.new
page.links.each do |ll|
page_links << ll
end
puts page_links.size
This works. But page_links
includes not only the search results. It also includes the google links like Login, Pictures, ...
The result links own a styleclass "1". Is it possible to select only the links with class 开发者_高级运维== 1? How do I achieve this?
Is it possible to modify the "agentalias"? If I own a website, including google analytics or something, what browserclient will I see in ga going with mechanize on my site?
Can I select elements by their ID instead of their name? I tried to use
my_form = page.form_with(:id => 'myformid')
But this does not work.
in such cases like your I am using Nokogiri DOM search. Here is your code a little bit rewritten:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.google.de')
pp page.title
google_form = page.form_with(:name => 'f')
google_form.q = 'test'
page = agent.submit(google_form)
pp page.title
page_links = Array.new
#maybe you better use 'h3.r > a.l' here
page.parser.css("a.l").each do |ll|
#page.parser here is Nokogiri::HTML::Document
page_links << ll
puts ll.text + "=>" + ll["href"]
end
puts page_links.size
Probably this article is a good place to start: getting-started-with-nokogiri By the way samples in the article also deal with Google search ;)
You can build a list of just the search result links by changing your code as follows:
page.links.each do |ll|
cls = ll.attributes.attributes['class']
page_links << ll if cls && cls.value == 'l'
end
For each element ll
in page.links
, ll.attributes
is a Nokogiri::XML::Element
and ll.attributes.attributes
is a Hash
containing the attributes on the link, hence the need for ll.attributes.attributes
to get at the actual class and the need for the nil check before comparing the value to 'l'
The problem with using :id
in the criteria to find a form is that it clashes with Ruby's Object#id
method for returning a Ruby object's internal id. I'm not sure what the work around for this is. You would have no problem selecting the form by some other attribute (e.g. its action.)
I believe the selector you are looking for is:
:dom_id
e.g. in your case:
my_form = page.form_with(:dom_id => 'myformid')
精彩评论