Combining Svm With Orthogonal Centroid Feature Selection For Spam Filtering
Hong-Liang Zhou,Chang-Yong Luo
2014-01-01
Abstract:Email has gained immense usage in daily communication due to its convenient, economical and easy to use nature. These days, however, the huge number of email spam has caused serious problems in email communication. To mitigate sufferings of spam emails, Variety of techniques have been developed. One main method is content-based spam filtering, for which, classification methods and feature selection algorithms are critical techniques. As a machine learning technique, Support Vector Machine (SVM) has been proved to be very effective in spam filtering. Feature selection is a very critical process to select the most discriminative features from the original high-dimension feature space for classifier training, which, to a large extent, determines the precision and efficiency of spam filtering. In this paper, we proposed a framework combining SVM with feature selection algorithm - OCFS (Orthogonal Centroid Feature Selection) for spam filtering. Extensive comparison experiments were performed on five benchmark spam corpuses (PU1, PU2, PU3, PUA and ZH1). The results showed that, compared with other traditional combinations, the combination of SVM and OCFS obtained more excellent performance in terms of Accuracy and F-Measure.