Follow and Scrape Links In Tweets with Rails
I'm currently parsin开发者_Go百科g twitter feeds through my rails app and wondering how it would be possible to follow links in the tweets and scrape the external content (for example, if the tweet contains a link to an article on TechCrunch, follow that link to the article and scrape the title and body content of the article). Flipboard for iPad is a perfect example of this.
I'm familiar with screen scraping using Nokogirl / Mechanize, but trying to figure out how to accomplish this in a way it could work regardless of where the link in the tweet is linking to (whether it be to TechCrunch, or Flickr, etc...).
Any thoughts / insights would be much appreciated!
Many of the major content providers provide a oembed endpoints. Take a look at the oembed_links gem. From the readme:
It allows you to easily parse text and query configured providers for embedding information on the links inside the text. A sample configuration file for configuring the library has been included (oembed_links_example.yml), though you may also configure the library programmatically (see rdocs).
If you use oembed_links in conjuction with http://oohembed.com/ you'll have dozens of content providers handled for you. You can easily write a custom provider to handle the rest.
精彩评论