开发者

Read a local HTML file with Mechanize

I am building a crawler, I know how to use ruby mechanize to read a page from the net using this code:

require 'mech开发者_如何学Goanize'
agent = Mechanize.new
agent.get "http://google.com"

But can I use Mechanize to read an HTML file from the file system? How?


just using the file:// protocol worked great for me:

html_dir = File.dirname(__FILE__)
page = agent.get("file:///#{html_dir}/example-file.html")

and about the raised question why someone would use mechanize to read local html files: I found it necessary for testing purposes - just store an example file locally and run your rspec against it.


I was unable to get the file:// protocol to work correctly for me. Instead I used Fakeweb by saving a web page locally and registering the URI

stream = File.read("saved_google_page.html")
FakeWeb.register_uri(:get, "http://www.google.com", :body => stream, :content_type => "text/html")

and having Fakeweb return it behind the scenes with a normal Mechanize process

agent = Mechanize.New
page = agent.get("http://www.google.com/")

See How to test a ruby application which uses mechanize


Basing on @Stephens answer; as fakeweb wasnt updated for a longer while and the maintainer situation is unclear, here an answer working around the issue using webmock, for whoever is in a hurry:

require 'webmock'
include WebMock::API

WebMock.enable!
stub_request(:get, "www.example.com").to_return(body: File.read("page.html"))

agent = Mechanize.New
page = agent.get("http://www.example.com/")

# ...


IMHO it doesn't make sense trying to use mechanize for such a situation. Maybe you would like to parse HTML. Then try nokogiri (mechanize uses it for parsing too)

e.g. use

Nokogiri::HTML(open('index.html'))

instead of

session.get('http://www.google.com')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜