Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task

Irfan Ahmad
DOI: https://doi.org/10.1007/978-3-030-20518-8_35
2019-01-01
Advances in Computational Intelligence
Abstract:Machine learning is an important area of Artificial Intelligence. It has applications in almost all the fields of science. Supervised machine learning, for classification problems, involves training the classifiers with labeled data. There are many classifiers, each having its own strengths and weaknesses in terms of classification accuracy and the ability of dealing with noisy class labels in the training data. There is limited work reported in the literature on investigating the performance of classifiers under different levels of class noise in the training data. The current work aims to presents a thorough investigation on the effects of class mislabeling on the performance of different classifiers. Five commonly used classifiers; SVM, random forest, ANN, naïve Bayes, and KNN were investigated on a benchmark database of handwritten digit images. Classifiers were trained with different levels of labeling noise, ranging from low, to medium, to very high, and their recognition performances were evaluated and compared. The study led to some interesting observations which are presented in this paper.
What problem does this paper attempt to address?