Caching web service results and expiring
I'm a new Rails programmer working on a web app. As part of this web app I am consuming a number of JSON pages generated by other sites' web services. On a new request from the user, I may need to poll 3-5 web services.
To help with the speed of repeated common requests, I'm trying to do a local cache of the results of the services. For each service, if they have any results that match, I parse their format and insert the rows into my local table, along with the 'pull' id (which has the source I pulled from and the date).
My question is: this seems like a pretty common thing to do. I have it wo开发者_如何学Pythonrking fine with a single datasource but need to expand, so before I write a bunch of helper methods to help make my life easier, I'm curious if there is a better way to do it in the Rails framework using a gem or some other plugin...
This type of behavior is best done in the background.
The more-conventional-but-still-awesome Ruby way of doing this would be to create a bunch of data sources in your DB and create a rake task that uses whatever tech you want in your app (use a gem like mechanize or nokogiri, write up a model, add some helpful Ruby classes in your app/lib folder, drop in a plugin or something from a vendor, whatever). Then you could invoke this rake task via a conventional cron job, or with something like clockwork (a badass ruby version of recurring-task manager, essentially).
The more new-wave way of doing this would be to drop in something like DelayedJob to handle updating of a single data source. When you fetch data successfully for a given source set the expiration out as long as you care to, and when your application grabs that cached data next it can create another job in the queue for one of your workers to run that'll update that data source. As soon as that job is complete, requests for that data can go to the fresh information. This prevents thing like having 5-minute-long rake tasks that wind up failing on an earlier source and never letting any data update, or having 6-hour polling intervals where one gets missed because the internet dropped for 12 seconds so your data is all stale by hours and hours.
There are a lot of tools to use, and I know you were asking for specifics, but I hope this more general information on methodology/architecture might give you an idea of what you can do.
精彩评论