Simplified Chinese spam mail filter:design and performance evaluation

Xu Yong
DOI: https://doi.org/10.3321/j.issn:1002-8331.2007.25.039
2007-01-01
Abstract:Paths to solving and methods of filtering unsolicited bulk e-mails,also known as spam,have been analyzed.And the method based on keyword and the statistical learning have been analyzed.Then a new method which is a combination of the two methods have been proposed.The method to filter spam using the na ve Bayesian decision theory,the nearest-neighbor classification,and the linear classification based the perceptron criterion function which is used in pattern classification has been introduced.The feature set used in the three theories have been gotten by mutual information.By comparied the three decision theories,the advantages and disadvantages of them has been presented.At same time,a good idea to filtering spam using mutual information has been pointed out in the paper.
What problem does this paper attempt to address?