开发者

Automating forms and scraping on a site using frames (using Mechanize)

I am trying to input data into a form开发者_如何学编程 and then scrape the results on a site using frames. I've been using Mechanize (ruby gem) for inputting data into the forms, which is fine. The problem is that Mechanize treats frames as links, and to "load" the frames and "see" the forms contained therein, you need to "click" the frames and load the pages like a separate HTML page.

Since this site uses separate frames for authentication, search forms, and results, I can't click on frames, fill in forms, and then get to the resulting frames to see the data that the forms generate since I am stuck in the frame I click into. If I try to go back by loading the original URL, I loose what I did in the previous frame.

If there is an app that loads all the content from all the frames without having to click on them, that would be perfect. I haven't found one yet.

Is there a way to do this using ruby, or any app that performs the same functions as Mechanize (and works with nokogiri) that loads frames?


Mechanise has some support for sessions, does the website not still keep you logged in if you click to the login page, then call back() and click to the search page?

When forms have frustrated me in the past, I've often resorted to using LiveHTTPHeaders (or a similar plugin) to detect the POSTs that are being carried out when logging in and searching, and then performing those without going through the pages themselves.

I'm not sure how well that will work with the authentication though.


To elaborate on Ben's response, I thought I would post my solution to the problem of Mechanize not being able to access frames and then navigate back to a frame since for my particular site it deauthenticates when you navigate back. His solution of using the call back() probably works for most sites, but I ended up taking a different route in the meantime.

I used Firewatir to pass data to forms through the Firefox browser. The code to access an element in a frame looks like this:

    b.frame(:name, "frame_name").field_type(:name, "field_name").action

Since you don't have to navigate to a frame in this situation, you don't have to worry about deauthentication or dependent frames reloading when you are navigating back and forth. Although Mechanize is a useful tool, I found Firewatir to be the better option when working with frames when the conditions are as stated above.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜