When to Accept Automated Predictions and When to Defer to Human Judgment?

Daniel Sikar,Artur Garcez,Tillman Weyde,Robin Bloomfield,Kaleem Peeroo
2024-08-13
Abstract:Ensuring the reliability and safety of automated decision-making is crucial. It is well-known that data distribution shifts in machine learning can produce unreliable outcomes. This paper proposes a new approach for measuring the reliability of predictions under distribution shifts. We analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids. We propose this distance as a metric to evaluate the confidence of predictions under distribution shifts. We assign each prediction to a cluster with centroid representing the mean softmax output for all correct predictions of a given class. We then define a safety threshold for a class as the smallest distance from an incorrect prediction to the given class centroid. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across these data sets and network models, and indicate that the proposed metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators given a distribution shift.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the reliability and safety issues of automated decision - making in machine learning models under the situation of distribution shift. Specifically, the author proposes a new method to measure the reliability of neural network predictions, especially when the data distribution in the training data and in the actual application is inconsistent. #### The distribution shift problem Distribution shift means that the distribution of the training data is different from the data distribution encountered after the model is deployed, which may lead to a significant decline in model performance, especially in high - risk fields where wrong predictions may bring serious consequences. Distribution shift can be manifested in forms such as covariate shift, concept drift or domain shift. #### The proposed new method To solve this problem, the author proposes a method based on clustering and softmax distance to evaluate the reliability of predictions. The specific steps are as follows: 1. **Clustering and class center calculation**: - Calculate the class center (centroid) of each class using the correct prediction results in the training set. The class center represents the average softmax output of all correct predictions of this class. - For each prediction, calculate its distance from the corresponding class center. 2. **Define a safety threshold**: - For each class, define a safety threshold, that is, the minimum distance from the wrong prediction to the class center. This threshold is used to determine whether the model prediction is reliable enough. 3. **Evaluation method**: - The effectiveness of this method is verified by conducting experiments on the MNIST and CIFAR - 10 datasets using convolutional neural networks (CNN) and visual transformers (ViT) respectively. - The experimental results show that this method can effectively judge when to accept automated predictions and when human intervention is required. #### Main contributions - Propose a lightweight method that uses distance measurement and clustering techniques to quantify the reliability of neural networks in the case of distribution shift. - Combine metric - based and accuracy - based methods to deal with distribution shift and demonstrate the effectiveness of this method on different network architectures such as CNN and ViT. ### Summary The core problem of this paper is to ensure the reliability and safety of machine learning models when facing distribution shift. By introducing a new method based on clustering and softmax distance, the author provides an effective means to evaluate and improve the reliability of model predictions, thereby finding a balance between automated decision - making and human intervention.