Extracting discriminative information from e-mail for spam detection inspired by Immune System

Yuanchun Zhu,Ying Tan
DOI: https://doi.org/10.1109/CEC.2010.5586290
2010-01-01
Abstract:Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a sliding window. At each area, a two-dimensional feature is constructed by calculating the concentrations of spam and legitimate email. Then all the features of each area are combined together as a whole feature vector. Several experiments are conducted on four benchmark corpora, by using 10-fold cross-validation. It is shown that the LC approach can extract the effective position correlated information from messages. Compared to the prevalent Bag-of-Words approach, the LC has better performance in terms of both accuracy and F1 measure. Most significantly, the LC approach can reduce feature dimensionality greatly and has much faster speed.
What problem does this paper attempt to address?