Read a local HTML file with Mechanize

2023-04-08 04:55 问答作者：

I am building a crawler, I know how to use ruby mechanize to read a page from the net using this code:

require 'mech开发者_如何学Goanize'
agent = Mechanize.new
agent.get "http://google.com"

But can I use Mechanize to read an HTML file from the file system? How?

just using the file:// protocol worked great for me:

html_dir = File.dirname(__FILE__)
page = agent.get("file:///#{html_dir}/example-file.html")

and about the raised question why someone would use mechanize to read local html files: I found it necessary for testing purposes - just store an example file locally and run your rspec against it.

I was unable to get the file:// protocol to work correctly for me. Instead I used Fakeweb by saving a web page locally and registering the URI

stream = File.read("saved_google_page.html")
FakeWeb.register_uri(:get, "http://www.google.com", :body => stream, :content_type => "text/html")

and having Fakeweb return it behind the scenes with a normal Mechanize process

agent = Mechanize.New
page = agent.get("http://www.google.com/")

See How to test a ruby application which uses mechanize

Basing on @Stephens answer; as fakeweb wasnt updated for a longer while and the maintainer situation is unclear, here an answer working around the issue using webmock, for whoever is in a hurry:

require 'webmock'
include WebMock::API

WebMock.enable!
stub_request(:get, "www.example.com").to_return(body: File.read("page.html"))

agent = Mechanize.New
page = agent.get("http://www.example.com/")

# ...

IMHO it doesn't make sense trying to use mechanize for such a situation. Maybe you would like to parse HTML. Then try nokogiri (mechanize uses it for parsing too)

e.g. use

Nokogiri::HTML(open('index.html'))

instead of

session.get('http://www.google.com')

继续阅读：mechanize ruby

Read a local HTML file with Mechanize

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？