开发者

How do I remove an image node with the given src attribute?

I need to remove an image with the give sr开发者_如何转开发c

img_src = "http://domain/img.jpg"
@doc.xpath("//img[@src='#{img_src}']")[0].remove

Doesn't work. Tried it also like this

@doc.xpath("//img[@src='#{img_src}']") {|x| x.remove}

Doesn't work either. Any ideas on what I'm doing wrong?

I got it. It was a stupid mistake. All your solutions were correct.


Nokogiri has two different parser modes, one for XML and one for HTML. XML is strict and HTML is very relaxed because, well, HTML is not always well-behaved.

doc = Nokogiri::XML('<xml><a>1</a></xml>')

or

doc = Nokogiri::HTML('<html><body>foo</body></html>')

This is how I generally parse an HTML file:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.example.com'))
print doc.to_html
# >> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
[...]

To strip a tag you need to locate it first, then remove it. After we parse a HTML or XML document we'll have a Nokogiri::HTML or Nokogiri::XML document respectively, and, at that point what we called "tags" are now called "nodes". Nokogiri can find nodesets, which are nodes that match a search, or an individual node, which will be the first match from a search.

This will search for the first node matching src="a.png" using a CSS accessor, which is generally easier/cleaner than XPath. Nokogiri understands both XPath and CSS very well, and there are some advantages to CSS mentioned on the website:

require 'nokogiri'
require 'open-uri'

html = '<html><body><img src="a.png" /><img src="b.png" /></body></html>'

doc = Nokogiri::HTML(html)
doc.at('img[@src="a.png"]').remove
print doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><img src="b.png"></body></html>

To locate all nodes matching the accessor you could replace doc.at('img[@src="a.png"]').remove with:

doc.search('img[@src="a.png"]').each { |n| n.remove }

The tutorials are worth reading too.


Works for me:

require 'nokogiri'
xml = <<ENDXML
  <root>
    <img src="http://foo/foo.jpg" />
    <img src="http://bar/bar.jpg" />
  </root>
ENDXML

doc = Nokogiri::XML xml
img_src = "http://foo/foo.jpg"

doc.at_xpath("//img[@src='#{img_src}']").remove

puts doc
#=> <?xml version="1.0"?>
#=> <root>
#=> 
#=> <img src="http://bar/bar.jpg"/>
#=> </root>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜