How Gmail spam filter works?

2023-01-09 00:34 问答作者：

I'm always surprised by the high quality of Gmail spam filter. For the last year, it filtered 99.95% of the spam, and blocked by mistake only one mail. By comparison, any other mail service I used makes at least one mistake for every 50 mails.

How, internally, Gmail does to reach this level of quality? Is it based on customers fe开发者_StackOverflowedback (ie. if N customers block mail as spam, it is sorted as spam for every other customer)? Or there is some trick? Maybe a basic filter algorithm filters the most obvious spam, and some difficult cases are analyzed by real humans?

Briefly speaking this is based on the community feedback. Here is a citation from official explanation:

Gmail users play an important role in keeping spammy messages out of millions of inboxes. When the Gmail community votes with their clicks to report a particular email as spam, our system quickly learns to start blocking similar messages. The more spam the community marks, the smarter our system becomes.

You can read a bit more about it on their Spam Explained page.

This is the million dollar question, and if it were able to be answered on stackOverflow, then everyones spam filter would be as effective.

I don't really know how exactly Google does SPAM filtering (but I think it's a business secret after all). If you are interested in how SPAM filtering works, I would recommend looking at Bayesian SPAM filtering (http://en.wikipedia.org/wiki/Bayesian_spam_filtering). It's a rather easy to understand method.

Google is most likely using a classifier system, such as Logistic Regression or Neural Networks. State of the art spam detection frequently employs Machine Learning algorithms such as these.

The output classification is "Spam" or "Not Spam," and the inputs, I'm sure, are top secret at Google, but I'm sure certain email text phrases such as "Buy Now," "On Sale," "Viagra," or "Male Enhancement" are all factors in their model.

There is no Official release on this, and most of the suggestions are just observations/experts view.

Based on my observations on emails we deliver, here are my findings:

1. User engagement is the key: If users are not engaging in your emails then your emails are bound to be flagged as spam. Here are some metrics: - Whom you email, and how often you email them - Which emails you open - Which emails you reply to - Keywords that are in emails you usually read - Which emails you star, archive, or delete

2. Sender Domain Reputation: What is the past history of the sending domain? If in past the user engagement was higher then probability of the new email from the same domain landing in Inbox is high.

Google is using complex AI and Machine learning algorithms to make this happen. While you might get some success by changing the IP, domain or return-path, but all that will be a very short term hacks.

继续阅读：email gmail spam-prevention

How Gmail spam filter works?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？