Rails architecture questions
I'm building a Rails site that, among other things, allows users to build their own recipe repository. Recipes are entered either manually or via a link to another site (think epicurious, cooks.com, etc). I'm writing scripts that will scrape a recipe from these sites given a link from a user, and so far (legal issues notwithstanding) that part isn't giving me any trouble.
However, I'm not sure where to put the code that I'm writing for these scraper scripts. My first thought was to put it in the recipes model, but it seems a bit too involved to go there; would a library or a helper be more appropriate?
Also, as I mentioned, I'm building several different scrapers for different food websites. It seems to me that the elegant way to do this would be to define an interface (or abstract base class) that determines a set of methods for constructing a recipe object given a link, but I'm not sure what the best approach would be here, either. How might I bui开发者_开发知识库ld out these OO relationships, and where should the code go?
You've got two sides of this thing that are obvious. The first is how you will store the recipes, which will be models. Obviously Models will not be scraping other sites, since they have a single responsibility: storing valid data. Your controller(s), which will initiate the scraping and storage process, should not contain the scraping code either (though they will call it).
While in Ruby we don't go for abstract classes nor interfaces -- it's duck-typed, so it's enough that your scrapers implement a known method or set of methods -- your scraping engines should all be similar, especially in terms of the public methods they expose.
You will put your scrapers -- and here's the lame answer -- wherever you want. lib
is fine, but if you want to make a plugin that might not be a bad idea either. See my question here - with a stunning answer by famous Rails-guy Yehuda Katz - for some other ideas, but in general: there is no right answer. There are some wrong ones, though.
The scrape engine should be a stand alone plugin or gem plugin. For the dirty and quick, you can put it inside lib. That's the usual convention anyway. It should probably implement a factory class that instantiates different types of scrapers depending on the url, so for client usage, it will be as simple as:
Scraper.scrape(url)
Also, if this is a long running task, you might want to consider using resque or delayed-jobs to offload the task to separate processes.
Try focusing on getting the stuff working first before moving it to gem/plugin. Also, forget about interface / abstract class - just write the code that does the thing. The only thing your model should know is if that's remote recipe, and what's the url. You could put all scraping code in app/scrapers. Here's an example implementation outline:
class RecipePage
def new(url)
@url = url
@parser = get_parser
end
def get_attributes
raise "trying to scrape unknown site" unless @parser
@parser.recipe_attributes(get_html)
end
private
def get_html
#this uses your favorite http library to get html from the @url
end
def get_parser(url)
#this matches url to your class, ie returns domain camelized, or nil if you are not handling particular site yet
return EpicurusComParser
end
end
class EpicurusComParser
def self.recipe_attributes(html)
# this does the hard job of querying html and moving
# all the code to get title, text, image of recipe and return hash
{
:title => "recipe title",
:text => "recipe text",
:image => "recipe_image_url",
}
end
end
then in your model
class Recipe
after_create :scrape_recipe, :if => :recipe_url
private
def scrape_recipe
# do that in background - ie in DelayedJob
recipe_page = RecipePage.new(self.recipe_url)
self.update_attributes(recipe_page.get_attributes.merge(:scraped => true))
end
end
Then you can create more parser, ie CookComParser etc
Often, utility classes that aren't really part of the MVC design are put into the lib
folder. I've also seen people put them into the models
folder, but lib
is really the "correct" place.
You could then create an instance of the recipe scraper within the controller as required, feeding the data into the model.
Not everything in app/models has to be an ActiveRecord model. Since they directly pertain to the business logic of your application, they belong in the app directory, not the lib directory. They are also neither controller, view, or helper (helpers are there to help the views and the views alone). So, they belong in app/models. I would make sure to namespace them, just for organizational purposes into app/models/scrapers or something of that sort.
I would create a folder in lib called scrapers. Then within that folder create one file per scraper. Call these epicurious, cooks etc. You could then define a base scrapers class that contains any shared methods that will be common to all scrapers. Similar to the followsing
lib/scrapers/base.rb
class Scrapers::base
def shared_1()
end
def shared_2()
end
def must_implement1
raise NotImplemented
end
def must_implement2
raise NotImplemented
end
end
lib/scrapers/epicurious.rb
Class Epicurious < Base
def must_implement1
end
def must_implement2
end
end
Then call the relevant class from within your controller using Scrapers::Epicurious.new
or call a class Method within Scrapers::Base
that calls the relevant implementation based upon a passed argument.
I'd set up a rake task to scrape the site and create new rake task. Once that is working I'd use a background processor or cron job to run the rake task.
精彩评论