Learning algorithm with non-balanced data for computer-aided diagnosis of breast cancer

沈晔,李敏丹,夏顺仁
DOI: https://doi.org/10.3785/j.issn.1008-973X.2013.01.001
2013-01-01
Abstract:When the learning algorithm handles non-balanced data in the computer-aided diagnosis, the prediction result of classifier is undesirably biased. The classification error of the big samples is small, while the classification error of the small samples is great. A reverse k nearest neighbor subsampling method was proposed in order to address the non-balanced learning issue. By removing the noisy and redundant samples from the big samples, and keeping the representative and reliable samples as the effective samples, the balanced training samples was realized, and the problem of the loss of the class information resulted from the subsampling was solved. The simulation results with the Breast-cancer dataset in UCI Machine Learning Repository show the validity of the algorithm to deal with the learning problems for non-balanced data. The experimental results show that the algorithm obviously outperforms existing methods.
What problem does this paper attempt to address?