Combining Naive Bayes and tri-gram language model for spam filtering

XiAo Ma,Yao Shen,Junbo Chen,Guirong Xue
DOI: https://doi.org/10.1007/978-3-642-25661-5_63
2011-01-01
Abstract:The increasing volume of bulk unsolicited emails (also known as spam) brings huge damage to email service providers and inconvenience to individual users. Among the approaches to stop spam, Naive Bayes filter is very popular. In this paper, we propose the standard Naive Bayes combining with a In-grain language model, namely TGNB model to filter spam emails. The TGNB model solves the problem of strong independence assumption of standard Naive Bayes model. Our experiment results on three public datasets indicate that the TGNB model can achieve higher spam recall and lower false positive, and even achieve better performance than support vector machine method which is state-of-the-art on all the three datasets.
What problem does this paper attempt to address?