Can't remove node in Nokogiri
I'm having a bit of a strange issue with Nokogiri in Rails. I'm trying to remove a "p" tag with a class of "why". I have the following code, which doesn't work:
def test_grab
f = File.open("public/test.html")
@doc = Nokogiri::HTML.parse(f)
f.close
@doc = @doc.css("p")
@doc.each do |p|
if p["class"] == "why"
logger.info p.values
p.remove
end
end
end
test.html:
<html>
开发者_开发技巧<head>
<title>Test</title>
</head>
<body>
<p>Test data</p>
<p>More <a href="http://stackoverflow.com">Test Data</a></p>
<p class="why">Why is this still here?</p>
</body>
</html>
Output html source:
<p>Test data</p>
<p>More <a href="http://stackoverflow.com">Test Data</a></p>
<p class="why">Why is this still here?</p>
I know the rails code is going into the if loop because the logger.info shows up on the server terminal.
Any ideas?
Is there any reason you're reusing your @doc
instance variable?
When it comes to troubleshooting stuff like this, I find the best idea is to try evaluating the same code without the Rails overhead. For example:
require 'nokogiri'
doc = Nokogiri::HTML(DATA)
doc.css("p").each do |p|
p.remove if p["class"] == "why"
end
__END__
<html>
<head>
<title>Test</title>
</head>
<body>
<p>Test data</p>
<p>More <a href="http://stackoverflow.com">Test Data</a></p>
<p class="why">Why is this still here?</p>
</body>
</html>
Which returns:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Test</title></head>
<body>
<p>Test data</p>
<p>More <a href="http://stackoverflow.com">Test Data</a></p>
</body>
</html>
Now trying doing paragraphs = @doc.css("p")
and then paragraphs.each ..
or just omit the whole assignment like I have above.
精彩评论