PHP and Twitter | Create Index Engine
H开发者_运维百科ere is what I have in mind:
1) Create a service that will run every hour or so and search for twits using a specific criteria
2) I also need to filter out garbage (index engine needs to be smart enough, kind of like anti-spam service)
What are the best strategies/ideas to accomplish this?
PS
Any ideas if there is anti-spam engine already created for twitter?
Well for starters probably the best place to begin is the Twitter API (2nd link from Google )and get your search working. If your server stack is of the *nix persuasion, using cron to schedule a wget/curl request to your search page would probably be the simplest strategy. Unfortunately my windows task scheduling knowledge is sorely lacking, but I'm certain there are better ways than using the crusty Task Scheduler.
Finally, for your filtering, writing a Bayesian classifier may be overkill as there may be services your can subscribe to but none that I am aware of for Twitter. Bayesian classifiers are quite common and I'm certain with a little research from your favorite search engine should result in either a canned solution or at least direction as to how to create your own. Keep in mind that spam is relative so you have to train your classifier, which at the start is a bit time consuming. And in fact PHP might not be the best language for the task, but something that your crontab can call periodically as well to do the training.
I realize that this is very high level, but the links should be enough of a jumping off point to get you started in the right direction.
you might want to look into http://www.socialoomph.com. They offer a service that will do what you are looking for.
精彩评论