Abstract:The quality of self-supervised pre-trained embeddings on out-of-distribution (OOD) data is poor without fine-tuning. A straightforward and simple approach to improving the generalization of pre-trained representation to OOD data is the use of deep ensembles. However, obtaining an effective ensemble in the embedding space with only unlabeled data remains an unsolved problem. We first perform a theoretical analysis that reveals the relationship between individual hyperspherical embedding spaces in an ensemble. We then design a principled method to align these embedding spaces in an unsupervised manner. Experimental results on the MNIST dataset show that our embedding-space ensemble method improves pre-trained embedding quality on in-distribution and OOD data compared to single encoders.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to improve the generalization ability of pre - trained encoders on out - of - distribution (OOD) data**. Specifically, the author focuses on the problem that pre - trained encoders in self - supervised learning perform poorly when facing OOD data. Although traditional deep ensemble methods can improve the prediction performance of models, effective integration in the embedding space remains an unsolved problem, especially when there is only unlabeled data. ### Specific description of the problem 1. **Poor performance of pre - trained encoders on OOD data**: - Self - supervised learning techniques enable the pre - training of deep neural network (DNN) encoders on a large amount of unlabeled data. These pre - trained encoders can be well transferred to various downstream tasks after fine - tuning. - However, without fine - tuning, the quality of pre - trained features on OOD data is significantly lower, which will affect the performance of subsequent OOD downstream tasks, especially when the downstream task data is insufficient. 2. **Limitations of existing ensemble methods**: - Deep ensembles (DEs) can improve prediction performance by training multiple DNNs with different initializations and data orders. - Existing ensemble methods usually aggregate models in the prediction output space or the weight space, but these methods have their own limitations: - Aggregating models in the prediction output space can only be applied to specific tasks (such as classification) and is not suitable for self - supervised pre - trained encoders. - Aggregating models in the weight space is flexible but sacrifices interpretability. ### Solution The author proposes a new method, namely **embedding - space ensemble**, named **Ensemble - InfoNCE**, which aims to improve the zero - shot generalization ability of pre - trained encoders on OOD data by aligning the embedding space. The specific steps include: 1. **Theoretical analysis**: - Reveal the orthogonal transformation relationship between different embedding spaces. - Prove that the aligned embedding - space ensemble can recover the correct latent variables. 2. **Unsupervised alignment**: - Propose an unsupervised method to align the embedding space, ensuring that semantically similar samples have similar directions in the hyperspherical embedding space. - Use the Karcher Mean algorithm to calculate the mean vector of the aligned embedding space. 3. **Experimental verification**: - Conduct experiments on the MNIST dataset, and the results show that Ensemble - InfoNCE has better embedding quality on both ID and OOD data than a single encoder and unaligned ensemble methods. Through this method, the author successfully improves the generalization ability of pre - trained encoders on OOD data and solves the limitations of existing methods in embedding - space integration.

Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Scalable Ensemble Diversification for OOD Generalization and Detection

The Role of Pretrained Representations for the OOD Generalization of Reinforcement Learning Agents

Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

EnAET: A Self-Trained Framework for Semi-Supervised and Supervised Learning With Ensemble Transformations

Towards out-of-distribution generalization in large-scale astronomical surveys: robust networks learn similar representations

Mean Embeddings with Test-Time Data Augmentation for Ensembling of Representations

GeRA: Label-Efficient Geometrically Regularized Alignment

Predicting the Performance of Foundation Models via Agreement-on-the-Line

Starbucks: Improved Training for 2D Matryoshka Embeddings

Three approaches to facilitate invariant neurons and generalization to out-of-distribution orientations and illuminations

Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches

Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning