Cost-guided class noise handling for effective cost-sensitive learning

Xingquan Zhu,Xindong Wu
DOI: https://doi.org/10.1109/ICDM.2004.10108
2004-01-01
Abstract:Research in machine learning, data mining and related areas has produced a wide variety of algorithms for cost-sensitive (CS) classification, where instead of maximizing the classification accuracy, minimizing the misclassification cost becomes the objective. However, these methods assume that training sets do not contain significant noise, which is rarely the case in real-world environments. In this paper, we systematically study the impacts of class noise on CS learning, and propose a cost-guided class noise handling algorithm to identify noise for effective CS learning. We call it cost-guided iterative classification filter (CICF), because it seamlessly integrates costs and an existing classification filter (C. Brodley and M. Friedl, 1999) for noise identification. Instead of putting equal weights to handle noise in all classes in existing efforts, CICF puts more emphasis on expensive classes, which makes it especially successful in dealing with datasets with a large cost-ratio. Experimental results and comparative studies from real-world datasets indicate that the existence of noise may seriously corrupt the performance of CS classifiers, and by adopting the proposed CICF algorithm, we can significantly reduce the misclassification cost of a CS classifier in noisy environments.
What problem does this paper attempt to address?