开发者

Web scraping using ruby

I am new to programming and I have a project where I have to write a Ruby script to retrieve info on a specified repository from github, parsing the data 开发者_运维问答from JSON format, and printing it in a usable format on the command line.

I have checked out mechanize guide. Any documentation that I can check in order to complete this?


Use Github's Repositories API. Everything you want is done there, without scraping or weird hacks. JSON formatted responses by default.


Following on to @Douglas' response. What you want to do is easy using the GitHub API and the HTTParty gem:

require 'httparty'
class Repository
  include HTTParty
  base_uri 'www.github.com'
end
response = Repository.get('/api/v2/json/repos/show/joncooper/beanstalkd')

require 'awesome_print'
>> ap response.parsed_response
{
    "repository" => {
                 "name" => "beanstalkd",
                 "size" => 128,
           "created_at" => "2011/04/29 09:43:43 -0700",
             "has_wiki" => true,
               "parent" => "kr/beanstalkd",
              "private" => false,
             "watchers" => 1,
                 "fork" => true,
             "language" => "C",
                  "url" => "https://github.com/joncooper/beanstalkd",
            "pushed_at" => "2011/07/05 22:10:53 -0700",
          "open_issues" => 0,
        "has_downloads" => true,
           "has_issues" => false,
             "homepage" => "http://kr.github.com/beanstalkd/",
                "forks" => 0,
          "description" => "Beanstalk is a simple, fast work queue.",
               "source" => "kr/beanstalkd",
                "owner" => "joncooper"
    }
}

See http://httparty.rubyforge.org/ for more.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜