Classification Performance Analysis of Decision Tree-Based Algorithms with Noisy Class Variable

Abdulmajeed Atiah Alharbi
DOI: https://doi.org/10.1155/2024/6671395
IF: 1.4
2024-02-02
Discrete Dynamics in Nature and Society
Abstract:Class noise is a common issue that affects the performance of classification techniques on real-world data sets. Class noise appears when a class variable in data sets has incorrect class labels. In the case of noisy data, the robustness of classification techniques against noise could be more important than the performance results on noise-free data sets. The decision tree method is one of the most popular techniques for classification tasks. The C4.5, CART, and random forest (RF) algorithms are considered to be three of the most used algorithms in decision trees. The aim of this paper is to reach conclusions on which decision tree algorithm is better to use for building decision trees in terms of its performance and robustness against class noise. In order to achieve this aim, we study and compare the performance of the models when applied to class variables with noise. The results obtained indicate that the RF algorithm is more robust to data sets with noisy class variable than other algorithms.
mathematics, interdisciplinary applications,multidisciplinary sciences
What problem does this paper attempt to address?