WEIGHTED NAIVE BAYES SPAM FILTERING METHOD BASED ON FEATURE TERM DISCRIMINATION

Hui Wang,Ziwei Huang,Shufen Liu
DOI: https://doi.org/10.3969/j.issn.1000-386x.2015.10.015
2015-01-01
Abstract:How to efficiently extract the features and the classification algorithm design are two keys to measure the advantages and disad-vantages of content-based spam filtering technology.In allusion to mutual information (MI)feature extraction algorithm and nave Bayes clas-sification algorithm,and by introducing the concept of feature term discrimination (FTD),we analyse the discrepancy of distinguishing ca-pacity of feature terms in categorising process,and then put forward a kind of feature extraction algorithm which gives the consideration to both FTD and MI.By further adding FTD to the design of classification algorithm,at last we present a weighted nave Bayes algorithm which solves the problem of content-base filtering efficiently.Experimental results show that the improved algorithm has significant improvement in terms of recall rate,precision rate and accuracy rate,and the performance of classification is more stable as well.
What problem does this paper attempt to address?