Retrieving RSS posts older than those included in feed
When creating an RSS reader, you download the XML formatted document pointed to by the RSS feed link, and you can parse it manually or using the functionality in the SyndicationFeed namespace.
So if we take Scott Guthrie's blog as an example, you download the RSS feed document here, and parse it. My problem is that this document only holds 15 items, yet he has been blogging for a number of years.
Is there a standard or established way of getting the older posts not included in the RSS feed document? Or do you have to find the base address for the blog posts an开发者_如何学Cd then parse the pages of the site from there to get them? How do you avoid missing posts on high volume blogs?
With RSS/Atom you can't query older articles.
I built a RSS archival service (https://app.pub.center). All of our data is free to use via REST. We charge money for push notifications.
PubCenter daily polls it's catalog of RSS feeds, and caches the articles. Then, you can get these articles back in a chronological order. For example:
Page 1 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=1
Page 2 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=2
As the replies to How Do I Fetch All Old Items on an RSS Feed? already mentioned, a feed may not provide archival data but historical items may be available from another source.
Archive.org’s Wayback Machine has an API to access historical content, including RSS feeds (if their bots have downloaded it). I’ve created the web tool Backfeed that uses this API to regenerate a feed containing concatenated historical items. If you'd like to discuss the implementation in detail please get in touch.
精彩评论