Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles

Shuman Peng,Arash Khoeini,Sharan Vaswani,Martin Ester
2024-11-20
Abstract:The quality of self-supervised pre-trained embeddings on out-of-distribution (OOD) data is poor without fine-tuning. A straightforward and simple approach to improving the generalization of pre-trained representation to OOD data is the use of deep ensembles. However, obtaining an effective ensemble in the embedding space with only unlabeled data remains an unsolved problem. We first perform a theoretical analysis that reveals the relationship between individual hyperspherical embedding spaces in an ensemble. We then design a principled method to align these embedding spaces in an unsupervised manner. Experimental results on the MNIST dataset show that our embedding-space ensemble method improves pre-trained embedding quality on in-distribution and OOD data compared to single encoders.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve the generalization ability of pre - trained encoders on out - of - distribution (OOD) data**. Specifically, the author focuses on the problem that pre - trained encoders in self - supervised learning perform poorly when facing OOD data. Although traditional deep ensemble methods can improve the prediction performance of models, effective integration in the embedding space remains an unsolved problem, especially when there is only unlabeled data. ### Specific description of the problem 1. **Poor performance of pre - trained encoders on OOD data**: - Self - supervised learning techniques enable the pre - training of deep neural network (DNN) encoders on a large amount of unlabeled data. These pre - trained encoders can be well transferred to various downstream tasks after fine - tuning. - However, without fine - tuning, the quality of pre - trained features on OOD data is significantly lower, which will affect the performance of subsequent OOD downstream tasks, especially when the downstream task data is insufficient. 2. **Limitations of existing ensemble methods**: - Deep ensembles (DEs) can improve prediction performance by training multiple DNNs with different initializations and data orders. - Existing ensemble methods usually aggregate models in the prediction output space or the weight space, but these methods have their own limitations: - Aggregating models in the prediction output space can only be applied to specific tasks (such as classification) and is not suitable for self - supervised pre - trained encoders. - Aggregating models in the weight space is flexible but sacrifices interpretability. ### Solution The author proposes a new method, namely **embedding - space ensemble**, named **Ensemble - InfoNCE**, which aims to improve the zero - shot generalization ability of pre - trained encoders on OOD data by aligning the embedding space. The specific steps include: 1. **Theoretical analysis**: - Reveal the orthogonal transformation relationship between different embedding spaces. - Prove that the aligned embedding - space ensemble can recover the correct latent variables. 2. **Unsupervised alignment**: - Propose an unsupervised method to align the embedding space, ensuring that semantically similar samples have similar directions in the hyperspherical embedding space. - Use the Karcher Mean algorithm to calculate the mean vector of the aligned embedding space. 3. **Experimental verification**: - Conduct experiments on the MNIST dataset, and the results show that Ensemble - InfoNCE has better embedding quality on both ID and OOD data than a single encoder and unaligned ensemble methods. Through this method, the author successfully improves the generalization ability of pre - trained encoders on OOD data and solves the limitations of existing methods in embedding - space integration.