开发者

What is a good open source package for building flexible spam detection on a large Rails site?

My site is getting larger and it's starting to attract a lot of spam through various channels. The site has a lot of different types of UGC (profiles, forums, blog comments, status updates, private messages, etc, etc). I have various mitigation efforts underway, which I hope to deploy in a blitzkrieg fashion to convince the spammers that we're not a worthwhile target. I have high confidence in what I'm doing functionality wise, but one missing piece is killing all the old spam all at once.

Here's what I have:

  • Large good/bad corpora (5-figure bad, 6 or 7-figure good). A lot of the spam has very reliable fingerprints, and the fact that I've sort of been ignoring it for 6 months helps :)
  • Larg开发者_C百科e, modular Rails site deployed to AWS. It's not a huge traffic site, but we're running 8 instances with the beginnings of a SOA.
  • Ruby, Redis, Resque, MySQL, Varnish, Nginx, Unicorn, Chef, all on Gentoo

My requirements:

  1. I want it to perform reasonably well given the volume of data (therefore I'm wary of a pure ruby solution).
  2. I should be able to train multiple classifications to different types of content (419-scam vs botnet link spam)
  3. I would like to be able to add manual factors based on our own detective work (pattern matching, IP reuse, etc)
  4. Ultimately I want to construct a nice interface to be used with Ruby. If this requires getting my hands dirty in C or whatever, I can handle it, but I'll avoid it if I can.

I realize this is a long and vague question, but what I'm looking for primarily is just a list of good packages, and secondarily any random thoughts from someone who has built a similiar system about ways to approach it.


We looked for an acceptable open source solution and didn't find one.

If you come to the same conclusion and decide to consider proprietary anti-spam, check out the paid Akismet collaborative spam filtering service. We've had decent performance from it across a dozen medium sized sites. It integrates with rails through rack and rackismet.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜