Web scraping using ruby
I am new to programming and I have a project where I have to write a Ruby script to retrieve info on a specified repository from github, parsing the data 开发者_运维问答from JSON format, and printing it in a usable format on the command line.
I have checked out mechanize guide. Any documentation that I can check in order to complete this?
Use Github's Repositories API. Everything you want is done there, without scraping or weird hacks. JSON formatted responses by default.
Following on to @Douglas' response. What you want to do is easy using the GitHub API and the HTTParty gem:
require 'httparty'
class Repository
include HTTParty
base_uri 'www.github.com'
end
response = Repository.get('/api/v2/json/repos/show/joncooper/beanstalkd')
require 'awesome_print'
>> ap response.parsed_response
{
"repository" => {
"name" => "beanstalkd",
"size" => 128,
"created_at" => "2011/04/29 09:43:43 -0700",
"has_wiki" => true,
"parent" => "kr/beanstalkd",
"private" => false,
"watchers" => 1,
"fork" => true,
"language" => "C",
"url" => "https://github.com/joncooper/beanstalkd",
"pushed_at" => "2011/07/05 22:10:53 -0700",
"open_issues" => 0,
"has_downloads" => true,
"has_issues" => false,
"homepage" => "http://kr.github.com/beanstalkd/",
"forks" => 0,
"description" => "Beanstalk is a simple, fast work queue.",
"source" => "kr/beanstalkd",
"owner" => "joncooper"
}
}
See http://httparty.rubyforge.org/ for more.
精彩评论