Convert URLs in HTML document?
I have an HTML docume开发者_StackOverflow中文版nt on foo.com, which consists of links, forms, asset URLs (images/JavaScript).
I want to serve it on bar.com without frames. I also want all relative URLs to be translated to absolute URLs with a host name of "bar.com", the asset URLs and form action URLs too.
I fetched the HTML doument from foo.com. What are the next steps to transform the URLs in it using Nokogiri?
Nokogiri is a HTML/XML parser. You could follow the official tutorial to find out how to parse your document.
Here is an example:
require 'rubygems'
require 'nokogiri'
# Open the remote document, or from local file
require 'open-uri' # load open-uri library if the input is from the Internet
doc = Nokogiri::HTML(open(URL_OR_PATH_TO_DOCUMENT))
# Search for img tags:
doc.css('img').each do |img|
# modify its attribute
img['src'] = "#{URL_PREFIX}/#{img['src']}"
end
# print the modified html
puts doc.to_html
require 'nokogiri'
require 'open-uri'
url = 'http://www.google.com'
doc = Nokogiri::HTML(open(url))
doc.xpath('//a').each do |d|
rel_url = d.get_attribute('href')
d.set_attribute('href', 'http://www.xyz.com/' + rel_url)
end
精彩评论