Abstract:Effective machine learning models learn both robust features that directly determine the outcome of interest (e.g., an object with wheels is more likely to be a car), and shortcut features (e.g., an object on a road is more likely to be a car). The latter can be a source of error under distributional shift, when the correlations change at test-time. The prevailing sentiment in the robustness literature is to avoid such correlative shortcut features and learn robust predictors. However, while robust predictors perform better on worst-case distributional shifts, they often sacrifice accuracy on majority subpopulations. In this paper, we argue that shortcut features should not be entirely discarded. Instead, if we can identify the subpopulation to which an input belongs, we can adaptively choose among models with different strengths to achieve high performance on both majority and minority subpopulations. We propose COnfidence-baSed MOdel Selection (CosMoS), where we observe that model confidence can effectively guide model selection. Notably, CosMoS does not require any target labels or group annotations, either of which may be difficult to obtain or unavailable. We evaluate CosMoS on four datasets with spurious correlations, each with multiple test sets with varying levels of data distribution shift. We find that CosMoS achieves 2-5% lower average regret across all subpopulations, compared to using only robust predictors or other model aggregation methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve high - performance machine - learning model selection in different subpopulations when facing distributional shift. Specifically, the authors focus on subpopulation shifts, that is, the situation where the proportions of different subpopulations in the training data and the test data change. ### Problem Background In the real world, data sets often have spurious correlations, that is, some features have a strong correlation with labels in the training data, but this correlation may no longer hold in the test data. For example, in a classification task, if most cows have a grass background and camels have a sandy background, then the standard model may rely on background information for classification. However, when the background information in the test data changes, this reliance will lead to a decline in model performance. ### Limitations of Existing Methods Traditional methods usually tend to avoid using these shortcut features and instead learn robust features. Although robust features perform better in the worst - case distribution shift, they often sacrifice accuracy on most subpopulations. Conversely, models that rely on spurious correlation features perform well on most subpopulations but poorly on a minority of subpopulations. ### Core Contributions of the Paper To solve the above problems, the authors propose a new method named COnfidence - baSed MOdel Selection (COSMOS). The main ideas of this method are: 1. **Identify Subpopulations**: Identify the subpopulation to which the input data belongs by observing the confidence of the model. 2. **Adaptive Model Selection**: Adaptively select the model that is most suitable for the identified subpopulation. ### Specific Implementation of the Method COSMOS does not require target labels or any group annotations, which makes it more practical in practical applications. Specific steps include: - **Calibrate the Model**: Use temperature scaling to calibrate each base model to ensure that the probability values output by it match the actual accuracy rate. - **Cluster Test Embeddings**: Embed the test data into a low - dimensional space and use the K - means clustering algorithm to divide it into multiple clusters. - **Select the Best Model**: For each cluster, select the model with the highest average confidence on this cluster as the final prediction model. ### Experimental Results The authors conducted experiments on four data sets with spurious correlations, and the results showed that the average regret of COSMOS on all subpopulations is 2 - 5% lower than that of a single model or other model aggregation methods. This means that COSMOS can improve the performance of a minority of subpopulations while maintaining high accuracy on most subpopulations. ### Summary This paper proposes a new method, COSMOS, which aims to deal with the subpopulation shift problem through confidence - based model selection. This method can significantly improve the classification accuracy of a minority of subpopulations without sacrificing the performance of most subpopulations, thereby achieving more stable performance in a wide range of test distributions. ### Formula Representation Some formulas involved in the paper are as follows: - **Conditional Probability**: \[ p(y|x, z)=p_{T_i}(y|x, z) \] It indicates that given the input \(x\) and subpopulation \(z\), the conditional probability of label \(y\) remains unchanged in the source distribution and the target distribution. - **Entropy Minimization Objective Function**: \[ \min_{z_i}\sum_i H(p(y|x, z_i))+\sum_{j,k}\text{dist}(p(z_i|x_j)||p(z_i|x_k))\lambda_{jk} \] where \(H\) represents entropy, \(\text{dist}\) represents a distance metric, and \(\lambda_{jk}\) is the weight of the smoothing regularization term.

Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts

Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters

Conservative Prediction via Data-Driven Confidence Minimization

Robust Conformal Prediction under Distribution Shift via Physics-Informed Structural Causal Model

Evaluating Model Performance Under Worst-case Subpopulations

Predictive Multiplicity in Probabilistic Classification

Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity

Causally Regularized Learning with Agnostic Data Selection Bias

COSSMO: predicting competitive alternative splice site selection using deep learning

A Robust Classifier Under Missing-Not-At-Random Sample Selection Bias

Robust Validation: Confident Predictions Even When Distributions Shift

Structurally Aware Robust Model Selection for Mixtures

Robustness to Spurious Correlations via Human Annotations

Fairer and more accurate, but for whom?

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

Robust Model Selection with Application in Single-Cell Multiomics Data

Compositional Risk Minimization

SCOD: From Heuristics to Theory

Predicting Census Survey Response Rates With Parsimonious Additive Models and Structured Interactions

Change is Hard: A Closer Look at Subpopulation Shift