AnyLoss: Transforming Classification Metrics into Loss Functions

Doheon Han,Nuno Moniz,Nitesh V Chawla
2024-05-24
Abstract:Many evaluation metrics can be used to assess the performance of models in binary classification tasks. However, most of them are derived from a confusion matrix in a non-differentiable form, making it very difficult to generate a differentiable loss function that could directly optimize them. The lack of solutions to bridge this challenge not only hinders our ability to solve difficult tasks, such as imbalanced learning, but also requires the deployment of computationally expensive hyperparameter search processes in model selection. In this paper, we propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, \textit{AnyLoss}, that is available in optimization processes. To this end, we use an approximation function to make a confusion matrix represented in a differentiable form, and this approach enables any confusion matrix-based metric to be directly used as a loss function. The mechanism of the approximation function is provided to ensure its operability and the differentiability of our loss functions is proved by suggesting their derivatives. We conduct extensive experiments under diverse neural networks with many datasets, and we demonstrate their general availability to target any confusion matrix-based metrics. Our method, especially, shows outstanding achievements in dealing with imbalanced datasets, and its competitive learning speed, compared to multiple baseline models, underscores its efficiency.
Machine Learning
What problem does this paper attempt to address?
The paper "AnyLoss: Transforming Classification Metrics into Loss Functions" addresses the challenge of using confusion matrix-based evaluation metrics as differentiable loss functions in the training of machine learning models, particularly focusing on binary classification tasks. The key points and contributions of the paper are summarized below: ### Problem Statement The paper identifies that many evaluation metrics used in binary classification are derived from the confusion matrix in a non-differentiable form. This makes it difficult to create a differentiable loss function that directly optimizes these metrics. The lack of a solution to bridge this gap hinders the ability to solve complex tasks, such as imbalanced learning, and necessitates computationally expensive hyperparameter searches in model selection. ### Proposed Solution The authors propose a general-purpose approach called **AnyLoss**, which transforms any confusion matrix-based metric into a loss function that can be used in optimization processes. This is achieved through the following steps: 1. **Approximation Function**: An approximation function is used to represent the confusion matrix in a differentiable form. This function takes the class probability (output of the sigmoid function) and amplifies it to values close to 0 or 1, enabling the construction of a confusion matrix that can be used to compute evaluation metrics. 2. **Differentiability**: The mechanism of the approximation function ensures