Invariant-Feature Subspace Recovery: A New Class of Provable Domain\n Generalization Algorithms
Haoxiang Wang,Gargi Balasubramaniam,Haozhe Si,Bo Li,Han Zhao
DOI: https://doi.org/10.48550/arXiv.2311.00966
2023-01-01
Abstract:Domain generalization asks for models trained over a set of training\nenvironments to generalize well in unseen test environments. Recently, a series\nof algorithms such as Invariant Risk Minimization (IRM) have been proposed for\ndomain generalization. However, Rosenfeld et al. (2021) shows that in a simple\nlinear data model, even if non-convexity issues are ignored, IRM and its\nextensions cannot generalize to unseen environments with less than $d_s+1$\ntraining environments, where $d_s$ is the dimension of the spurious-feature\nsubspace. In this work, we propose Invariant-feature Subspace Recovery (ISR): a\nnew class of algorithms to achieve provable domain generalization across the\nsettings of classification and regression problems. First, in the binary\nclassification setup of Rosenfeld et al. (2021), we show that our first\nalgorithm, ISR-Mean, can identify the subspace spanned by invariant features\nfrom the first-order moments of the class-conditional distributions, and\nachieve provable domain generalization with $d_s+1$ training environments. Our\nsecond algorithm, ISR-Cov, further reduces the required number of training\nenvironments to $O(1)$ using the information of second-order moments. Notably,\nunlike IRM, our algorithms bypass non-convexity issues and enjoy global\nconvergence guarantees. Next, we extend ISR-Mean to the more general setting of\nmulti-class classification and propose ISR-Multiclass, which leverages class\ninformation and provably recovers the invariant-feature subspace with $\\lceil\nd_s/k\\rceil+1$ training environments for $k$-class classification. Finally, for\nregression problems, we propose ISR-Regression that can identify the\ninvariant-feature subspace with $d_s+1$ training environments. Empirically, we\ndemonstrate the superior performance of our ISRs on synthetic benchmarks.\nFurther, ISR can be used as post-processing methods for feature extractors such\nas neural nets.