开发者

use ruby to get content length of URLs

I am trying to write a ruby script that gets some details about files on a website using net/http. My code looks like this:

require 'open-uri'
require 'net/http'

url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) {|http|
  http.get(asset)
} 

headers = res.to_hash
p headers

I would like to get two pieces of information from this request: the total length of the content inflated, and (as appropriate) the length of the content deflated.

Sometimes, the headers will include a content-length parameter, which appears to be the gzipped length of the content. I can also approx开发者_JAVA技巧imate the inflated size of the content using res.body.length, but this has not been foolproof by any stretch of the imagination. The documentation on net/http says that gzip headers are removed from the list automatically (to help me, gee thanks) so I cannot seem to get a reliable handle on this information.

Any help is appreciated (including other gems if they will do this more easily).


Got it! The "magic" behavior here only occurs if you don't specify your own accept-encoding header. Amended code as follows:

require 'open-uri'
require 'net/http'
require 'date'
require 'zlib' 

headers = { "accept-encoding" => "gzip;q=1.0,deflate;q=0.6,identity;q=0.3" }
url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) {|http|
  http.get(asset, headers)
}

headers = res.to_hash

gzipped = headers['content-encoding'] && headers['content-encoding'][0] == "gzip"
content = gzipped ? Zlib::GzipReader.new(StringIO.new(res.body)).read : res.body 


full_length = content.length,
compressed_length = (headers["content-length"] && headers["content-length"][0] || res.body.length), 


You can try use sockets to send HEAD request to the server with is faster (no content) and don't send "Accept-Encoding: gzip", so your response will not be gzip.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜