开发者

How to pass cookies from one page to another using curl in Ruby?

I am doing a video crawler in ruby. In there I have to log in to a page by enabling cookies and download pages. For that I am using the CURL library in ruby. I can successfully log in, but I can't download the pages inside that with curl. How can I fix this or download the pages otherwise?

My code is

curl = Curl::Easy.new(1st url)
curl.follow_location = true
curl.enable_cookies = true
curl.cookiefile = "cookie.txt"
curl.cookiejar = "cookie.txt"
curl.http_post(1st url,field)

curl.perform
curl = Curl::Easy.perform(2nd url)
curl.follow_location = true
curl.enable_cookies = true
curl.cookief开发者_JS百科ile = "cookie.txt"
curl.cookiejar = "cookie.txt"
curl.http_get

code = curl.body_str


What I've seen in writing my own similar "post-then-get" script is that ruby/Curb (I'm using version 0.7.15 with ruby 1.8) seems to ignore the cookiejar/cookiefile fields of a Curl::Easy object. If I set either of those fields and the http_post completes successfully, no cookiejar or cookiefile file is created. Also, curl.cookies will still be nil after your curl.http_post, however, the cookies ARE set within the curl object. I promise :)

I think where you're going wrong is here:

curl = Curl::Easy.perform(2nd url)

The curb documentation states that this creates a new object. That new object doesn't have any of your existing cookies set. If you change your code to look like the following, I believe it should work. I've also removed the curl.perform for the first url since curl.http_post already implicitly does the "perform". You were basically http_post'ing twice before trying your http_get.

curl = Curl::Easy.new(1st url)
curl.follow_location = true
curl.enable_cookies = true
curl.http_post(1st url,field)

curl.url = 2nd url
curl.http_get

code = curl.body_str

If this still doesn't seem to be working for you, you can verify if the cookie is getting set by adding

curl.verbose = true

Before

curl.http_post

Your Curl::Easy object will dump all the headers that it gets in the response from the server to $stdout, and somewhere in there you should see a line stating that it added/set a cookie. I don't have any example output right now but I'll try to post a follow-up soon.


HTTPClient automatically enables cookies, as does Mechanize.

From the HTTPClient docs:

clnt = HTTPClient.new
clnt.get_content(url1) # receives Cookies.
clnt.get_content(url2) # sends Cookies if needed.

Posting a form is easy too:

body = { 'keyword' => 'ruby', 'lang' => 'en' }
res = clnt.post(uri, body)


Mechanize makes this sort of thing really simple (It will handle storing the cookies, among other things).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜