开发者

Can't remove node in Nokogiri

I'm having a bit of a strange issue with Nokogiri in Rails. I'm trying to remove a "p" tag with a class of "why". I have the following code, which doesn't work:

def test_grab
  f = File.open("public/test.html")
  @doc = Nokogiri::HTML.parse(f)
  f.close
  @doc = @doc.css("p")
  @doc.each do |p|
    if p["class"] == "why"
      logger.info p.values
      p.remove
    end
  end
end

test.html:

<html>
开发者_开发技巧<head>
    <title>Test</title>
</head>
<body>
    <p>Test data</p>
    <p>More <a href="http://stackoverflow.com">Test Data</a></p>
    <p class="why">Why is this still here?</p>
</body>
</html>

Output html source:

<p>Test data</p>
<p>More <a href="http://stackoverflow.com">Test Data</a></p>
<p class="why">Why is this still here?</p>

I know the rails code is going into the if loop because the logger.info shows up on the server terminal.

Any ideas?


Is there any reason you're reusing your @doc instance variable?

When it comes to troubleshooting stuff like this, I find the best idea is to try evaluating the same code without the Rails overhead. For example:

require 'nokogiri'

doc = Nokogiri::HTML(DATA)
doc.css("p").each do |p|
  p.remove if p["class"] == "why" 
end

__END__
<html>
<head>
    <title>Test</title>
</head>
<body>
    <p>Test data</p>
    <p>More <a href="http://stackoverflow.com">Test Data</a></p>
    <p class="why">Why is this still here?</p>
</body>
</html>

Which returns:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Test</title></head>
<body>
    <p>Test data</p>
    <p>More <a href="http://stackoverflow.com">Test Data</a></p>

</body>
</html>

Now trying doing paragraphs = @doc.css("p") and then paragraphs.each .. or just omit the whole assignment like I have above.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜