A Support Vector Machine Based Naive Bayes Algorithm for Spam Filtering

Weimiao Feng,Jianguo Sun,Liguo Zhang,Cuiling Cao,Qing Yang
DOI: https://doi.org/10.1109/pccc.2016.7820655
2016-01-01
Abstract:Naive Bayes classifiers are widely used to filter spam emails, however, the strong independence assumptions between features limit their performance in accurately identifying spams. To address this issue, we proposed a support machine vector based naive Bayes - SVM-NB - filtering system. The SVM-NB first constructs an optimal separating hyperplane that divides samples in the training set into two categories. For samples located nearby the hyperplane, if they are in different categories, one of them will be eliminated from the training set. In this way, the dependence between samples is reduced and the entire training sample space is simplified. With the trimmed training set, the naive Bayes algorithm is applied to classify emails in the test set. The SVM-NB system is evaluated with the dataset obtained from DATAMALL. Experiment results demonstrate that SVM-NB can achieve a higher spam-detection accuracy and a faster classification speed.
What problem does this paper attempt to address?