开发者

How do I remove HTTP links with ActiveSupport's "starts_with" using Nokogiri?

When I try this:

item.css("a").each do |a|
  if !a.starts_with? 'http://'
     a.replace a.content
  end
end

I get:

NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60> 

EDIT:

Sure there is a cleaner way, but this seems to be working.

item.css("开发者_Go百科a").each do |a|
  unless a["href"].blank?
    if !a["href"].starts_with? 'http://' 
      a.replace a.content
    end
  end
end


The problem is you're trying to use the starts_with method on an object that doesn't implement it.

item.css("a").each do |a|

will return XML nodes in a. Those belong to Nokogiri. What you want to do is convert the node to text, but only the part you want to check, which, because it's a parameter of the node, can be accessed like this:

a['href']

So, you want to use something like this:

item.css("a").each do |a|
  if !(a.starts_with?['href']('http://'))
     a.replace(a.content)
  end
end

The downside to this is you have to walk through every <a> tag in the document, which can be slow on a big page with lots of links.

An alternate way to go about it is to use XPath's starts-with function:

require 'nokogiri'

item = Nokogiri::HTML('<a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>')
puts item.to_html

which outputs:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>
>> <a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>
>> </body></html>

Here's how to do it using XPath:

item.search('//a[not(starts-with(@href, "http://"))]').each do |a|
  a.replace(a.content)
end
puts item.to_html

Which outputs:

>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>foo<a href="http://bar">bar</a>
>> </body></html>

The advantage to using XPath to find the nodes is it all runs in compiled C, rather than letting Ruby do it.


Shouldn't that method be start_with?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜