How do I monitor forum posts to check only new posts, instead of checking the entire list each time?
Im developing software that monitors posts on forums and alerts the admin/moderators when keywords are are matched in the post title (swear words, porn etc).
I've set up a timer, every 30 seconds it will monitor as it's a busy forum. My issue is how to store the "last post checked" so next time it runs it doesn't go through the whole forum.
No idea how to go about it. I've tried a few things which don't seem to work. I'm getting really annoyed at myself more than anything as I've been through university (software engineering) and fail to solve a simple proble开发者_JAVA百科m.
Any advice appreciated.
Edit: Parsing HTML as the forum owner does not want the application to connect to database.
Why don't you just implement a profanity filter? Now before I get down voted for that, I'm totally against them, and think they're a horribly stupid idea, but I know many clients require them for legality purposes.
But, instead of checking the forum after the fact, why don't you check for swear words prior to the submission of the post?
What kind of access do you have to things like posts? If you can execute a simple query like "SELECT * FROM [ForumPosts] Where PostTimeStamp > @lastChecked", what problem are you facing?
You need to save the id of the last post that you checked, then on each run, only check posts with an id higher than the saved id. You could either save this id to a text file, or to a database table.
Here is an overview of how to read and write to files that might help you get started.
It sounds like you are doing this by parsing the HTML, is that right?
If you have access to the backend data store of the Forum then it would be much easier. For example if they have a table Posts
then you just keep the last ID that you checked. If you have to work through the HTML result it will be much trickier.
Don't feel bad. Most universities, at least in the USA, are clueless about how to teach their students current software development skills.
What if you were to store the last post checked in a table in your database, and then, when it's time to scan the forum, read the number of the last post (or whatever you store there), and go from there. Then, when done, update this table with the last post number.
I'm not sure how you've implemented your solution, but if the posts had a number for example, you could store the last number in a variable and then check the value of the variable when re-checking.
That's assuming you have your numbers in order.
Does each forum have an ID? If so, you can keep track of the IDs you've already checked (or if the IDs are incremental, then check only the IDs > last checked ID).
Method 1: if you have access to the records (posts) in the database, use those
Method2" If you are consuming content, like an rss Feed, you will have to hold the records from last check and compare those with new entries, if they have been reviewd.
something Similar:
public class PostCompareManager
{
public void ComparePosts()
{
// may use url(string) of the post as ID or replace
// it with something unique, representing each post
Dictionary<string, Post> revPost = new Dictionary<string,Post>(); // replace with you HTTP get logic
Dictionary<string, Post> newPost = new Dictionary<string, Post>(); // replace with you HTTP get logic
// compare keys in Dictionaries
var oldKeys = revPost.Keys;
foreach (var k in newPost.Keys)
{ oldKeys.Contains(k); } //do something
}
}
class Post
{ string title; string description; string url; }
Hope this helps
精彩评论