While making an RSS reader which saves articles, how can I prevent duplicates?
Lets say I have a RSS feed which lists the 3 newest questions on SO. At 1 o'clock, the feed looks like this:
- While making an RSS reader which saves articles, how can I prevent duplicates?
- Convert char array to UNICODE in MFC C++
- How to deploy a Java Swing application with an embedded JavaDB database?
At 2 o'clock, this feed looks like:
- django url from another templa开发者_JS百科te than the one associated with the view-function
- While making an RSS reader which saves articles, how can I prevent duplicates?
- Convert char array to UNICODE in MFC C++
(duplicate articles are bold)
I want to download the RSS feed every 5 minutes, parse it and save the articles that aren't already saved, but I do not want duplicates (items that remain in the new, updated feed like the examples above). What can I use to determine if an article is already saved? Thanks
In theory, you can just use guid for RSS 2, and id for Atom. These are each supposed to be permanent and unique. However, in practice some sites don't conform to this, so you have to use heuristics.
精彩评论