Robust Loss Functions for Training Decision Trees with Noisy Labels

Jonathan Wilton,Nan Ye
2024-01-23
Abstract:We consider training decision trees using noisily labeled data, focusing on loss functions that can lead to robust learning algorithms. Our contributions are threefold. First, we offer novel theoretical insights on the robustness of many existing loss functions in the context of decision tree learning. We show that some of the losses belong to a class of what we call conservative losses, and the conservative losses lead to an early stopping behavior during training and noise-tolerant predictions during testing. Second, we introduce a framework for constructing robust loss functions, called distribution losses. These losses apply percentile-based penalties based on an assumed margin distribution, and they naturally allow adapting to different noise rates via a robustness parameter. In particular, we introduce a new loss called the negative exponential loss, which leads to an efficient greedy impurity-reduction learning algorithm. Lastly, our experiments on multiple datasets and noise settings validate our theoretical insight and the effectiveness of our adaptive negative exponential loss.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to design a robust loss function when training decision trees on data with noisy labels. Specifically, the author focuses on how to improve the robustness and prediction performance of decision - tree learning algorithms by improving the loss function in the presence of noisy labels. ### Decomposition of the Main Problem 1. **Impact of Label Noise**: - In machine learning, label noise (i.e., incorrect labels) often occurs due to the difficulty of labeling data or using crowdsourcing platforms for labeling. These noises will have a negative impact on the training and prediction performance of the model. 2. **Deficiencies of Existing Methods**: - Existing methods for dealing with label noise include deleting mis - labeled samples, implicit/explicit regularization, and using robust loss functions, etc. However, for decision - tree learning, especially in the label - noise environment, the design and understanding of robust loss functions have not been fully studied. 3. **Characteristics of Decision - Tree Learning**: - Decision - tree learning is usually described as a greedy - based impurity - reduction algorithm, and this algorithm is equivalent to minimizing certain loss functions. Therefore, designing a robust loss function suitable for decision trees can significantly improve its performance in a noisy environment. ### Main Contributions of the Paper 1. **Theoretical Analysis**: - The concept of conservative loss functions is proposed, and it is proved that such loss functions have an early - stopping behavior during the training process and a noise - tolerant prediction ability during the test. 2. **Framework Construction**: - A distribution - based robust loss function framework (distribution losses) is introduced. By assuming the marginal distribution and applying percentile penalties, it naturally adapts to different noise rates. In particular, a new negative exponential loss function is introduced, which derives an efficient greedy - impurity - reduction learning algorithm. 3. **Experimental Verification**: - The effectiveness of the theoretical analysis is verified through experiments on multiple datasets and noise settings, demonstrating the superior performance of the adaptive negative exponential loss function. ### Summary This paper proposes a new loss function framework by in - depth analysis of the robustness of existing loss functions and proves its effectiveness in dealing with label noise through experiments. This is of great significance for improving the application performance of decision trees in the real world.