开发者

How do I fetch AJAX-loaded content from an another site using Nokogiri?

I was trying to parse some HTML content from a site. Nokogiri works perfectly for content loaded the first time.

Now the issue is how to fetch that content which is loaded using AJAX. For example, there is a "see more" link and more items are开发者_如何学运维 fetched using AJAX, or consider a case for AJAX-based tabs.

How can I fetch that content?


You won't be able to parse anything that requires a JavaScript runtime to produce that content using Nokogiri. Nokogiri is a HTML/XML parser, not a web browser.

PhantomJS on the other hand is a web browser, albeit a special kind of browser ;) Take a look at that and have a play.


It isn't completely clear what you want to do, but if you are trying to get access to additional HTML that is loaded by AJAX, then you will need to study the code, figure out what URL is being used for the AJAX request, whether any session IDs or cookies have been set, then create a new URL that reproduces what AJAX is using. Request that, and you should get the new content back.

That can be difficult to do though. As @Nuby said, Mechanize could be good help, as it is designed to manage cookies and sessions for you in the background. Mechanize uses Nokogiri internally so if you request a page from Mechanize, you can use Nokogiri searches against it to drill down and extract any particular JavaScript strings. They'll be present as text, so then you can use regex or substring matches to get at the particular parameters you need, then construct the new URL and ask Mechanize to get it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜