Abstract:Figuring out which Pre-Trained Model (PTM) from a model zoo fits the target task is essential to take advantage of plentiful model resources. With the availability of numerous heterogeneous PTMs from diverse fields, efficiently selecting the most suitable PTM is challenging due to the time-consuming costs of carrying out forward or backward passes over all PTMs. In this paper, we propose Model Spider, which tokenizes both PTMs and tasks by summarizing their characteristics into vectors to enable efficient PTM selection. By leveraging the approximated performance of PTMs on a separate set of training tasks, Model Spider learns to construct tokens and measure the fitness score between a model-task pair via their tokens. The ability to rank relevant PTMs higher than others generalizes to new tasks. With the top-ranked PTM candidates, we further learn to enrich task tokens with their PTM-specific semantics to re-rank the PTMs for better selection. Model Spider balances efficiency and selection ability, making PTM selection like a spider preying on a web. Model Spider demonstrates promising performance in various configurations of model zoos.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently and accurately select the model most suitable for a specific downstream task from a large number of pre - trained models (PTMs). With the increase in the number and types of pre - trained models, it becomes impractical to directly fine - tune each model to evaluate its performance, because this requires a large amount of computing resources and time. Although existing methods can estimate the transfer ability of models through forward propagation, these methods still have efficiency problems when facing large - scale and complex pre - trained model libraries. To solve these problems, the authors propose **MODEL SPIDER**, a new framework that efficiently selects the most suitable pre - trained model by representing pre - trained models and task features as vectors (i.e., "tokens") and learning the similarity between these vectors. Specifically, the main contributions of MODEL SPIDER include: 1. **Efficient PTM selection**: By representing PTMs and tasks as vectors (tokens), MODEL SPIDER can quickly evaluate their match with the task without the need for forward propagation for each PTM. 2. **Learning tokens and similarity measures**: MODEL SPIDER learns how to construct tokens on an independent training task set and measures the similarity between different PTM and task tokens through supervised learning. 3. **Flexible integration of forward - pass results**: When resources permit, MODEL SPIDER can use the forward - pass results of some top - ranked PTMs to further improve the semantic representation of tokens and the final PTM ranking. ### Formula representation - **Generation of task tokens**: \[ \mu(T)=\left\{\frac{1}{|I(y_{i} = c)|}\sum_{(x_{i},y_{i})\in T}[\psi(x_{i})\cdot I(y_{i}=c)]\right\}_{c\in[C]} \] where $\psi$ is an additional frozen encoder used to extract features of task instances. - **Model - task similarity measure**: \[ \mathrm{sim}(\theta_{m},\mu(T)) = \mathrm{FC}(\mathrm{transformer}(z)[0]) \] where $z = [\theta_{m},\mu(T)]\in\mathbb{R}^{d\times(1 + C)}$, and $\mathrm{FC}$ is a fully - connected layer used to project the intermediate result to a scalar. - **Ranking loss function**: \[ \ell_{\mathrm{rank}}(\hat{t},t)=\sum_{m = 1}^{M}-\log\left(\frac{\exp(\hat{t}_{\mathrm{dsc}(m)})}{\sum_{l = m}^{M}\exp(\hat{t}_{\mathrm{dsc}(l)})}\right) \] where $\mathrm{dsc}(m)$ represents the PTM index corresponding to the $m$ - th largest ground - truth score. In this way, MODEL SPIDER not only improves the efficiency of PTM selection but also can provide more accurate model selection results in the case of limited resources.

Model Spider: Learning to Rank Pre-Trained Models Efficiently

Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Challenges of Using Pre-trained Models: the Practitioners' Perspective

TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model

Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models

Pre-Trained Models: Past, Present and Future

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models

PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

PETA: Evaluating the Impact of Protein Transfer Learning with Sub-word Tokenization on Downstream Applications

RankTopic: Ranking Based Topic Modeling

Model Selection with Model Zoo via Graph Learning

SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification

Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models

Pre-Trained Model Recommendation for Downstream Fine-tuning

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

Coding-PTMs: How to Find Optimal Code Pre-trained Models for Code Embedding in Vulnerability Detection?

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Software Engineering Tasks