An Improved Naive Bayes Classification Algorithm for Unbalanced Data Sets

YAO Yu,DONG Benzhi,CHEN Guangsheng
DOI: https://doi.org/10.13482/j.issn1001-7011.2015.05.027
2015-01-01
Abstract:When training samples of each class are distributed unevenly and sparsely,the classification efficiency of Naive Bayes is not accurate enough. To solve this problem,a Naive Bayes text classification algorithm based on data smoothing and weighted complementary set was proposed,using data smoothing algorithm to calculate the compensation probability of the missing feature in Naive Bayes model,which can solve the data sparseness problem. Since training samples of each class are distributed unevenly,it uses features of current categories' complementary set to represent the features of current categories,which can solve the problem of recognizing the larger category and ignoring the smaller category. The experimental results show that the classification efficiency of the proposed algorithm is better than the traditional Naive Bayes when the training data set is uneven.
What problem does this paper attempt to address?