Model Spider: Learning to Rank Pre-Trained Models Efficiently

Yi-Kai Zhang,Ting-Ji Huang,Yao-Xiang Ding,De-Chuan Zhan,Han-Jia Ye
2023-06-07
Abstract:Figuring out which Pre-Trained Model (PTM) from a model zoo fits the target task is essential to take advantage of plentiful model resources. With the availability of numerous heterogeneous PTMs from diverse fields, efficiently selecting the most suitable PTM is challenging due to the time-consuming costs of carrying out forward or backward passes over all PTMs. In this paper, we propose Model Spider, which tokenizes both PTMs and tasks by summarizing their characteristics into vectors to enable efficient PTM selection. By leveraging the approximated performance of PTMs on a separate set of training tasks, Model Spider learns to construct tokens and measure the fitness score between a model-task pair via their tokens. The ability to rank relevant PTMs higher than others generalizes to new tasks. With the top-ranked PTM candidates, we further learn to enrich task tokens with their PTM-specific semantics to re-rank the PTMs for better selection. Model Spider balances efficiency and selection ability, making PTM selection like a spider preying on a web. Model Spider demonstrates promising performance in various configurations of model zoos.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently and accurately select the model most suitable for a specific downstream task from a large number of pre - trained models (PTMs). With the increase in the number and types of pre - trained models, it becomes impractical to directly fine - tune each model to evaluate its performance, because this requires a large amount of computing resources and time. Although existing methods can estimate the transfer ability of models through forward propagation, these methods still have efficiency problems when facing large - scale and complex pre - trained model libraries. To solve these problems, the authors propose **MODEL SPIDER**, a new framework that efficiently selects the most suitable pre - trained model by representing pre - trained models and task features as vectors (i.e., "tokens") and learning the similarity between these vectors. Specifically, the main contributions of MODEL SPIDER include: 1. **Efficient PTM selection**: By representing PTMs and tasks as vectors (tokens), MODEL SPIDER can quickly evaluate their match with the task without the need for forward propagation for each PTM. 2. **Learning tokens and similarity measures**: MODEL SPIDER learns how to construct tokens on an independent training task set and measures the similarity between different PTM and task tokens through supervised learning. 3. **Flexible integration of forward - pass results**: When resources permit, MODEL SPIDER can use the forward - pass results of some top - ranked PTMs to further improve the semantic representation of tokens and the final PTM ranking. ### Formula representation - **Generation of task tokens**: \[ \mu(T)=\left\{\frac{1}{|I(y_{i} = c)|}\sum_{(x_{i},y_{i})\in T}[\psi(x_{i})\cdot I(y_{i}=c)]\right\}_{c\in[C]} \] where $\psi$ is an additional frozen encoder used to extract features of task instances. - **Model - task similarity measure**: \[ \mathrm{sim}(\theta_{m},\mu(T)) = \mathrm{FC}(\mathrm{transformer}(z)[0]) \] where $z = [\theta_{m},\mu(T)]\in\mathbb{R}^{d\times(1 + C)}$, and $\mathrm{FC}$ is a fully - connected layer used to project the intermediate result to a scalar. - **Ranking loss function**: \[ \ell_{\mathrm{rank}}(\hat{t},t)=\sum_{m = 1}^{M}-\log\left(\frac{\exp(\hat{t}_{\mathrm{dsc}(m)})}{\sum_{l = m}^{M}\exp(\hat{t}_{\mathrm{dsc}(l)})}\right) \] where $\mathrm{dsc}(m)$ represents the PTM index corresponding to the $m$ - th largest ground - truth score. In this way, MODEL SPIDER not only improves the efficiency of PTM selection but also can provide more accurate model selection results in the case of limited resources.