Abstract:Advanced classification algorithms are being increasingly used in safety-critical applications like health-care, engineering, etc. In such applications, miss-classifications made by ML algorithms can result in substantial financial or health-related losses. To better anticipate and prepare for such losses, the algorithm user seeks an estimate for the probability that the algorithm miss-classifies a sample. We refer to this task as the risk-assessment. For a variety of models and datasets, we numerically analyze the performance of different methods in solving the risk-assessment problem. We consider two solution strategies: a) calibration techniques that calibrate the output probabilities of classification models to provide accurate probability outputs; and b) a novel approach based upon the prediction interval generation technique of conformal prediction. Our conformal prediction based approach is model and data-distribution agnostic, simple to implement, and provides reasonable results for a variety of use-cases. We compare the different methods on a broad variety of models and datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to evaluate the misclassification risk of the model in multi - class classification algorithms. Specifically, the author is concerned that in safety - critical applications (such as healthcare, engineering, etc.), misclassifications of machine - learning algorithms may lead to serious financial or health losses. Therefore, users need to estimate the probability of the model misclassifying samples, and this task is called risk assessment. ### Problem Background In safety - critical applications, misclassifications of the model may bring serious consequences. For example: - **Skin Cancer Detection**: Misclassifying a healthy patient may lead to unnecessary treatment, while misclassifying a diseased patient may delay treatment. - **Fault Detection in Offshore Wind Farms**: Missed faults or false alarms will bring huge economic losses. Therefore, users hope to be able to predict in advance and prepare to deal with these potential errors, especially to understand the probability of the model misclassifying samples. ### Definition of the Risk Assessment Problem Given the input \(X\in\mathbb{R}^d\), the true label \(Y\in\{0, 1,\ldots, K\}\) and the model output \(\hat{Y}(X)\in\{0, 1,\ldots, K\}\), the goal is to estimate the misclassification probability \(P(Y\neq\hat{Y}(X))\). Here \(K\) represents the number of classes. For the sake of simplicity, assume that the model makes point predictions \(\hat{Y}(X)\). For more complex cases, the model can output a prediction interval (PI) containing the top \(k\) categories. In this case, the expression of the misclassification problem will contain PI instead of \(\hat{Y}(X)\). ### Deficiencies of Existing Methods Existing classification models usually output the probability of each category, but these probabilities are often over - confident, that is, they overestimate the correctness of some categories. This will lead to underestimating the risk of model failure, which is very disadvantageous for safety - critical applications. Therefore, existing methods need to be improved to more accurately reflect the real misclassification probability. ### Solutions The paper proposes two main solution strategies: 1. **Calibration Technique**: By calibrating the probabilities output by the model to make them closer to the real probabilities, thus providing a more accurate risk assessment. 2. **New Method Based on Conformity Prediction**: A technique that uses conformity prediction to generate prediction intervals. This method is not sensitive to the model and data distribution, is simple and easy to implement, and performs well in multiple use cases. ### Conclusion The paper compares the performance of various risk assessment methods through experiments on different models and datasets, aiming to find a more accurate and conservative risk assessment method, especially in safety - critical applications. In summary, the core problem of this paper is to develop an effective method to evaluate the misclassification risk in multi - class classification algorithms to ensure that potential errors and losses can be better dealt with in practical applications.

An In-Depth Examination of Risk Assessment in Multi-Class Classification Algorithms

Distribution-free risk assessment of regression-based machine learning algorithms

Using random forest for reliable classification and cost-sensitive learning for medical diagnosis

An empirical study of classification algorithm evaluation for financial risk prediction

Probabilistic Safety Regions Via Finite Families of Scalable Classifiers

Quantifying Uncertainty in Deep Learning Classification with Noise in Discrete Inputs for Risk-Based Decision Making

Misclassification Risk and Uncertainty Quantification in Deep Classifiers

MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis

Risk-aware Classification via Uncertainty Quantification

Learning Optimized Risk Scores

Examining imbalanced classification algorithms in predicting real-time traffic crash risk

Conformal Risk Control for Ordinal Classification

SCRIB: Set-classifier with Class-specific Risk Bounds for Blackbox Models

Performance analysis of various classification algorithms for providing competency training to workplace risk prevention

BSM loss: A superior way in modeling aleatory uncertainty of fine_grained classification

Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

On (assessing) the fairness of risk score models

(Un)fairness in Post-operative Complication Prediction Models

A Machine Learning-Based Risk Assessment System Prediction Algorithm for Examining Medical Insurance Costs

Uncertainty Aware Training to Improve Deep Learning Model Calibration for Classification of Cardiac MR Images