Novel method for Chinese spam detection based on one-class support vector machines

Donghong Sun,Quang Anh Tran,HaiXin Duan,Guodong Zhang
2005-01-01
Journal of Information and Computational Science
Abstract:Along with the increasing application of e-mail through Internet, spam, especially Chinese spam, has become one of the major problems in network security. This paper proposes a novel method based on One-class Support Vector Machine (SVM) for detecting Chinese spam automatically. Chinese word segmentation and TF-IDF formula are used to convert e-mails into vectors. The One-class SVM first models ham distribution, then detects e-mail that falls outside this distribution as spam. The One-class SVM can also models spam distribution, then detects e-mail that falls outside this distribution as ham. An evolving training model method is used to select the best training model for One-class SVM. A evaluation spam/ham dataset is used to illustrate the performances of our method. Experiment result demonstrates that our method can reach the detection/false alarm rates as 0.92/0.016 and 0.987/0.09 for spam detection and ham detection respectively.
What problem does this paper attempt to address?