Abstract:Nowadays, the use of machine learning models is becoming a utility in many applications. Companies deliver pre-trained models encapsulated as application programming interfaces (APIs) that developers combine with third-party components and their own models and data to create complex data products to solve specific problems. The complexity of such products and the lack of control and knowledge of the internals of each component used unavoidable cause effects, such as lack of transparency, difficulty in auditability, and the emergence of potential uncontrolled risks. They are effectively black-boxes. Accountability of such solutions is a challenge for the auditors and the machine learning community. In this work, we propose a wrapper that given a black-box model enriches its output prediction with a measure of uncertainty when applied to a target domain. To develop the wrapper, we follow these steps: Modeling the distribution of the output. In a text classification setting, the output is a probability distribution p(y|X, w*) over the different classes to predict, y, given an input text X and the pre-trained model with parameters w*. We model this output by a random variable to measure the variability that the data noise causes in the output. Here we consider the output distribution coming from a Dirichlet probability density function, thus p(y|X, w*) ~ Dir(). Decomposition of the Dirichlet concentration parameter. To relate the output of the classifier with the concentration parameter in the Dirichlet distribution, we propose a decomposition of the concentration parameter in two terms: = y. The role of this scalar is to control the spread of the distribution around the expected value, i.e. the original prediction y. Training the wrapper. Sentences are represented as the average value of their word embeddings. This representation feeds a neural network that outputs a single regression value that models the parameter . For each input, we combine and the black-box prediction to obtain the corresponding distribution for the output ym,i ~ Dir(i). By using Monte Carlo sampling, we approximate the expected value of the classification probabilities, [EQUATION] and we train the model applying a cross-entropy loss over the predictions and the labels. Obtaining an uncertainty score from the wrapper. To obtain a numerical value for the uncertainty of a prediction, we draw samples from the resulting Dir() to evaluate the predictive entropy with [EQUATION], thus obtaining a numerical score for the uncertainty of each prediction. Using uncertainty for rejection. Based on this wrapper, we provide an actionable mechanism to mitigate risk in the form of decision rejection: once equipped with a value for the uncertainty of a given prediction, we can choose not to issue that prediction when the risk or uncertainty in that decision is significant. This results in a rejection system that selects the more confident predictions, discards those more uncertain, and leads to an improvement in the trustability of the resulting system. We showcase the proposed technique and methodology in a practical scenario where we apply a simulated sentiment analysis API based on NLP to different domains. On each experiment, we train a sentiment classifier using text reviews of products in a source domain. We apply the pre-trained black-box to obtain the predictions for the reviews from a target domain. The tuples of review plus black-box predictions are then used for training the wrapper to obtain the uncertainty. Finally, we use the uncertainty score to sort the predictions from more to less uncertain, and we search for a rejection point that maximizes the three performance measures: non-rejected accuracy, and classification and rejection quality. Experiments demonstrate the effectiveness of the uncertainty measure computed by the wrapper and shows its high correlation to bad quality predictions and misclassifications. In all the cases, the uncertainty metric here proposed outperforms traditional uncertainty measures.

Logit-based Uncertainty Measure in Classification

Theoretical characterization of uncertainty in high-dimensional linear classification

How to evaluate uncertainty estimates in machine learning for regression?

Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty

A Novel Regression Loss for Non-Parametric Uncertainty Optimization

Uncertainty Quantification in Deep Neural Networks through Statistical Inference on Latent Space

Uncertainty Quantification Metrics for Deep Regression

Uncertainty Quantification in Logistic Regression using Random Fuzzy Sets and Belief Functions

Approaching Neural Network Uncertainty Realism

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

The Peril of Popular Deep Learning Uncertainty Estimation Methods

Uncertain logistic regression models

A Geometric Method for Improved Uncertainty Estimation in Real-time

Dirichlet uncertainty wrappers for actionable algorithm accuracy accountability and auditability

Density Uncertainty Layers for Reliable Uncertainty Estimation

Understanding Measures of Uncertainty for Adversarial Example Detection

Uncertainty Voting Ensemble for Imbalanced Deep Regression

Unified Uncertainty Calibration

Awareness of uncertainty in classification using a multivariate model and multi-views

A Meta-heuristic Approach to Estimate and Explain Classifier Uncertainty