Read a local HTML file with Mechanize
I am building a crawler, I know how to use ruby mechanize to read a page from the net using this code:
require 'mech开发者_如何学Goanize'
agent = Mechanize.new
agent.get "http://google.com"
But can I use Mechanize to read an HTML file from the file system? How?
just using the file:// protocol worked great for me:
html_dir = File.dirname(__FILE__)
page = agent.get("file:///#{html_dir}/example-file.html")
and about the raised question why someone would use mechanize to read local html files: I found it necessary for testing purposes - just store an example file locally and run your rspec against it.
I was unable to get the file://
protocol to work correctly for me. Instead I used Fakeweb by saving a web page locally and registering the URI
stream = File.read("saved_google_page.html")
FakeWeb.register_uri(:get, "http://www.google.com", :body => stream, :content_type => "text/html")
and having Fakeweb return it behind the scenes with a normal Mechanize process
agent = Mechanize.New
page = agent.get("http://www.google.com/")
See How to test a ruby application which uses mechanize
Basing on @Stephens answer; as fakeweb
wasnt updated for a longer while and the maintainer situation is unclear, here an answer working around the issue using webmock
, for whoever is in a hurry:
require 'webmock'
include WebMock::API
WebMock.enable!
stub_request(:get, "www.example.com").to_return(body: File.read("page.html"))
agent = Mechanize.New
page = agent.get("http://www.example.com/")
# ...
IMHO it doesn't make sense trying to use mechanize for such a situation. Maybe you would like to parse HTML. Then try nokogiri (mechanize uses it for parsing too)
e.g. use
Nokogiri::HTML(open('index.html'))
instead of
session.get('http://www.google.com')
精彩评论