What Is Missing in IRM Training and Evaluation? Challenges and Solutions

Yihua Zhang,Pranay Sharma,Parikshit Ram,Mingyi Hong,Kush Varshney,Sijia Liu
2023-03-04
Abstract:Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions, and as a principled solution for preventing spurious correlations from being learned and for improving models' out-of-distribution generalization. Yet, recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice or could be impossible to achieve in some scenarios. Therefore, a series of advanced IRM algorithms have been developed that show practical improvement over IRM. In this work, we revisit these recent IRM advancements, and identify and resolve three practical limitations in IRM training and evaluation. First, we find that the effect of batch size during training has been chronically overlooked in previous studies, leaving room for further improvement. We propose small-batch training and highlight the improvements over a set of large-batch optimization techniques. Second, we find that improper selection of evaluation environments could give a false sense of invariance for IRM. To alleviate this effect, we leverage diversified test-time environments to precisely characterize the invariance of IRM when applied in practice. Third, we revisit (Ahuja et al. (2020))'s proposal to convert IRM into an ensemble game and identify a limitation when a single invariant predictor is desired instead of an ensemble of individual predictors. We propose a new IRM variant to address this limitation based on a novel viewpoint of ensemble IRM games as consensus-constrained bi-level optimization. Lastly, we conduct extensive experiments (covering 7 existing IRM variants and 7 datasets) to justify the practical significance of revisiting IRM training and evaluation in a principled manner.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: several key issues existing in the current Invariant Risk Minimization (IRM) training and evaluation, which limit the performance and reliability of IRM in practical applications. Specifically, the author identifies and addresses the following three main problems: 1. **Large - batch training problem**: - Most of the existing IRM methods adopt large - batch training, which may cause the model to fall into sub - optimal solutions and be unable to effectively escape from bad local optima. - The author proposes to use small - batch training and shows its superiority over a series of large - batch optimization techniques. 2. **Inappropriate evaluation environment selection problem**: - The selection of a single test environment may give inaccurate invariance evaluation results, leading to misjudgment of IRM performance. - The author suggests using diverse test environments to evaluate the invariance of IRM more accurately. 3. **Single invariant predictor generation problem**: - The IRM - GAME method achieves invariance learning by assigning a separate prediction head to each environment, but this method ultimately generates a set of predictors rather than a single invariant predictor. - The author proposes the BLOC - IRM method, which integrates multiple prediction heads into a single invariant predictor through the Bi - Level Optimization (BLO) framework with consensus constraints. ### Formula Summary 1. **IRM Bi - Level Optimization Problem (BLO)**: \[ \min_{\theta} \sum_{e \in E_{\text{tr}}} \ell^{(e)}(w^*(\theta) \circ \theta) \] \[ \text{subject to } w^*(\theta) \in \arg\min_{\bar{w}} \ell^{(e)}(\bar{w} \circ \theta), \forall e \in E_{\text{tr}} \] 2. **IRM V1 Relaxed Single - Level Optimization Problem**: \[ \min_{\theta} \sum_{e \in E_{\text{tr}}} \left[ \ell^{(e)}(\theta) + \gamma \|\nabla_w \ell^{(e)}(w \circ \theta)\|^2_2 \right] \bigg|_{w = 1.0} \] 3. **BLOC - IRM Objective Function**: \[ \min_{\theta} \sum_{e \in E_{\text{tr}}} \left[ \ell^{(e)}(w^*(\theta) \circ \theta) + \gamma \|\nabla_w \ell^{(e)}(w^*(\theta) \circ \theta)\|^2_2 \right] \] \[ \text{subject to } (I): w^{(e)}(\theta) \in \arg\min_{\bar{w}^{(e)}} \ell^{(e)}(\bar{w}^{(e)} \circ \theta), \forall e \in E_{\text{tr}}, \] \[ (II): w^*(\theta)=\frac{1}{N} \sum_{e \in E_{\text{tr}}} w^{(e)}(\theta) \] ### Conclusion Through the above improvements, the paper not only reveals the deficiencies in the existing IRM methods but also provides specific solutions, thereby improving the generalization ability and stability of IRM on different datasets. In particular, the experimental results of the BLOC - IRM method on multiple benchmark datasets show that it can significantly outperform the existing IRM variants.