Using Mechanize with Google Docs

2023-01-02 14:54 问答作者：

I'm trying to use Mechanize login to Google Docs so that I can scrape something (not pos开发者_Python百科sible from the API) but I keep seem to keep getting a 404 when trying to follow the meta redirect:

require 'rubygems'
require 'mechanize'

USERNAME = "..."
PASSWORD = "..."

LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"

agent = Mechanize.new
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = USERNAME
login_form.Passwd = PASSWORD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts "redirect: #{redirect}"

followed_page = agent.get(redirect) # throws a HTTPNotFound exception

pp followed_page

Can anyone see why this isn't working?

Andy you're awesome!! Your code helped me to make my script workable and to login into google account. I found your error after couple of hours.It was about html escaping. As I found,Mechanize automatically escapes uri it recieves as a parameter for 'get' method. So my solution is:

EMAIL  = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive   = true
agent.redirect_ok  = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"

login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page

This works just fine for me. I have replaced continue parameter from the meta tag (which is already escaped) by new one.

继续阅读：mechanize ruby screen-scraping

Using Mechanize with Google Docs

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？