Parsing html with rails and nokogiri
I need to parse HTML using Rails and Nokogiri. Here is the HTML:
<body>
<div id="mama">
<div class="test1">text</div>
<div class="test2">text2</div>
</div>
<div id="mama">
<div class="test1">text</div>
<div class="test2">text2</div>
</d开发者_如何学编程iv>
<div id="mama">
<div class="test1">text</div>
<div class="test2">text2</div>
</div>
</body>
How I should form loop question? I've tried so many times but still getting an error or bad results... ...
doc.xpath('//div[@id='mama']/?or what?').each do |node|
parse_file.puts text1
parse_file.puts text2
parse_file.puts text1
parse_file.puts \n
end
Result should be like
text from first mama
text2 from first mama
text from first mama
text from second mama
and so on...
First, note that the HTML you posted is syntactically invalid: it is illegal to have more than one element with the same id
attribute value. If you have control over your HTML, you should fix this problem.
Using that same (invalid) HTML, however, Nokogiri still has no trouble:
require 'nokogiri'
doc = Nokogiri::HTML(my_html)
doc.css('#mama').each_with_index do |div,i|
puts "#{div.at_css('.test1').text} from mama ##{i}"
puts "#{div.at_css('.test2').text} from mama ##{i}"
end
#=> text from mama #0
#=> text2 from mama #0
#=> text from mama #1
#=> text2 from mama #1
#=> text from mama #2
#=> text2 from mama #2
If you wanted to use XPath directly (as Nokogiri does behind the scenes for the CSS) you would do this:
doc.xpath("//div[@id='mama']").each_with_index do |div,i|
puts "#{div.at_xpath("./*[@class='test1']").text} from mama ##{i}"
puts "#{div.at_xpath("./*[@class='test2']").text} from mama ##{i}"
end
For one thing, your apostrophes/quotes are off. They should be...
doc.xpath('//div[@id="mama"]/?or what?')
精彩评论