Abstract:The proliferation of pretrained models, as a result of advancements in pretraining techniques, has led to the emergence of a vast zoo of publicly available models. Effectively utilizing these resources to obtain models with robust out-of-distribution generalization capabilities for downstream tasks has become a crucial area of research. Previous research has primarily focused on identifying the most powerful models within the model zoo, neglecting to fully leverage the diverse inductive biases contained within. This paper argues that the knowledge contained in weaker models is valuable and presents a method for leveraging the diversity within the model zoo to improve out-of-distribution generalization capabilities. Specifically, we investigate the behaviors of various pretrained models across different domains of downstream tasks by characterizing the variations in their encoded representations in terms of two dimensions: diversity shift and correlation shift. This characterization enables us to propose a new algorithm for integrating diverse pretrained models, not limited to the strongest models, in order to achieve enhanced out-of-distribution generalization performance. Our proposed method demonstrates state-of-the-art empirical results on a variety of datasets, thus validating the benefits of utilizing diverse knowledge.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the generalization capability of machine learning models when faced with Out-of-Distribution (OOD) data, particularly how to leverage Pre-Trained Models (PTMs) to improve model performance on unseen datasets. Specifically, the core issues of the research are: 1. **How to effectively utilize a large number of pre-trained model resources**: With the advancement of pre-training techniques, a large number of publicly available pre-trained models have been produced. These models have different architectures, training data sources, and pre-training strategies, thus containing diverse inductive biases. The paper points out that previous research often focuses only on selecting the strongest models for ensemble or transfer learning, neglecting the valuable knowledge that may exist in other "weaker" models. 2. **Improving the OOD generalization capability of models**: In practical applications, machine learning models often encounter OOD data, where the test data comes from a different distribution than the training data. In such cases, even models that perform well on benchmark tests may suffer from performance degradation. The goal of the paper is to enhance the OOD generalization capability of models by leveraging the diversity among different pre-trained models. To achieve the above goals, the paper proposes the following key methods: - **Analyzing the behavior of different pre-trained models**: By defining two metrics, Feature Diversity Shift and Feature Correlation Shift, the paper quantifies the behavioral differences of different pre-trained models in a given domain generalization task. This helps in understanding the characteristics of each model and its sensitivity to specific types of distribution shifts. - **Proposing a new ensemble algorithm**: Utilizing the results of the above metrics, the paper designs an ensemble algorithm that effectively integrates the diverse knowledge of different pre-trained models. The algorithm includes two key components: a sample reweighting module and an independence penalty module. The former uses the outputs of models dominated by correlation shifts to balance the weights of subgroups, while the latter requires the main classifier's output to be independent of features that have undergone significant diversity shifts, thus avoiding the model being affected by specific types of distribution shifts. Experiments demonstrate that this method can effectively utilize models that are usually considered to perform poorly to enhance the overall system performance, achieving results superior to existing best methods on multiple image classification benchmark datasets.

Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

Zoo-Tuning: Adaptive Transfer from A Zoo of Models.

Model Selection with Model Zoo via Graph Learning

Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

Workshop on Model Mining

Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining

SIMPLE: Specialized Model-Sample Matching for Domain Generalization

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

Zookt: Task-Adaptive Knowledge Transfer of Model Zoo for Few-Shot Learning

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

Domain Generalization using Pretrained Models without Fine-tuning

Diversity Boosted Learning for Domain Generalization with Large Number of Domains

Discrepancies among Pre-trained Deep Neural Networks: A New Threat to Model Zoo Reliability

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Attention Diversification for Domain Generalization

Model Reuse with Domain Knowledge

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

Model-Based Domain Generalization

Domaindiff: Boost out-of-Distribution Generalization with Synthetic Data.