开发者

Convert URLs in HTML document?

I have an HTML docume开发者_StackOverflow中文版nt on foo.com, which consists of links, forms, asset URLs (images/JavaScript).

I want to serve it on bar.com without frames. I also want all relative URLs to be translated to absolute URLs with a host name of "bar.com", the asset URLs and form action URLs too.

I fetched the HTML doument from foo.com. What are the next steps to transform the URLs in it using Nokogiri?


Nokogiri is a HTML/XML parser. You could follow the official tutorial to find out how to parse your document.

Here is an example:

require 'rubygems'
require 'nokogiri'
# Open the remote document, or from local file
require 'open-uri' # load open-uri library if the input is from the Internet
doc = Nokogiri::HTML(open(URL_OR_PATH_TO_DOCUMENT))

# Search for img tags:
doc.css('img').each do |img|
  # modify its attribute
  img['src'] = "#{URL_PREFIX}/#{img['src']}"
end

# print the modified html
puts doc.to_html


require 'nokogiri'
require 'open-uri'

url = 'http://www.google.com'
doc = Nokogiri::HTML(open(url))
doc.xpath('//a').each do |d|
  rel_url = d.get_attribute('href')
  d.set_attribute('href', 'http://www.xyz.com/' + rel_url)
end
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜