How do I remove HTTP links with ActiveSupport's "starts_with" using Nokogiri?
When I try this:
item.css("a").each do |a|
if !a.starts_with? 'http://'
a.replace a.content
end
end
I get:
NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60>
EDIT:
Sure there is a cleaner way, but this seems to be working.
item.css("开发者_Go百科a").each do |a|
unless a["href"].blank?
if !a["href"].starts_with? 'http://'
a.replace a.content
end
end
end
The problem is you're trying to use the starts_with
method on an object that doesn't implement it.
item.css("a").each do |a|
will return XML nodes in a
. Those belong to Nokogiri. What you want to do is convert the node to text, but only the part you want to check, which, because it's a parameter of the node, can be accessed like this:
a['href']
So, you want to use something like this:
item.css("a").each do |a|
if !(a.starts_with?['href']('http://'))
a.replace(a.content)
end
end
The downside to this is you have to walk through every <a>
tag in the document, which can be slow on a big page with lots of links.
An alternate way to go about it is to use XPath's starts-with
function:
require 'nokogiri'
item = Nokogiri::HTML('<a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>')
puts item.to_html
which outputs:
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>
>> <a href="doesnt_start_with">foo</a><a href="http://bar">bar</a>
>> </body></html>
Here's how to do it using XPath:
item.search('//a[not(starts-with(@href, "http://"))]').each do |a|
a.replace(a.content)
end
puts item.to_html
Which outputs:
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
>> <html><body>foo<a href="http://bar">bar</a>
>> </body></html>
The advantage to using XPath to find the nodes is it all runs in compiled C, rather than letting Ruby do it.
Shouldn't that method be start_with?
精彩评论