Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

Yi Xin,Siqi Luo,Haodi Zhou,Junlong Du,Xiaohong Liu,Yue Fan,Qing Li,Yuntao Du
2024-02-08
Abstract:Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning paradigm is becoming unsustainable due to high computational and storage demands. In response, researchers are exploring parameter-efficient fine-tuning (PEFT), which seeks to exceed the performance of full fine-tuning with minimal parameter modifications. This survey provides a comprehensive overview and future directions for visual PEFT, offering a systematic review of the latest advancements. First, we provide a formal definition of PEFT and discuss model pre-training methods. We then categorize existing methods into three categories: addition-based, partial-based, and unified-based. Finally, we introduce the commonly used datasets and applications and suggest potential future research challenges. A comprehensive collection of resources is available at
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores Parameter-Efficient Fine-Tuning (PEFT) methods for Pre-Trained Vision Models (PVMs) and provides a comprehensive overview and future research directions. With the development of large-scale pre-trained vision models, these models have demonstrated strong performance in various downstream computer vision tasks. However, due to their massive parameter sizes (reaching billions or even trillions), traditional full-model fine-tuning methods face high computational and storage demands, making them unsustainable. To address this challenge, researchers have proposed parameter-efficient fine-tuning methods, which achieve comparable or even better performance than full-model fine-tuning by updating only a minimal portion of the parameters. These methods leverage the strong generalization capabilities of large pre-trained models trained on rich data, assuming that most parameters can be shared in new tasks without significant modifications. The specific contributions of the paper are as follows: 1. **Definition and Classification**: First, a formal definition of parameter-efficient fine-tuning is provided, and model pre-training methods are discussed. Then, existing methods are classified into three categories: Addition-based Tuning, Partial-based Tuning, and Unified-based Tuning. 2. **Method Overview**: - **Addition-based Tuning**: Includes Adapter Tuning, Prompt Tuning, Prefix Tuning, and Side Tuning. These methods learn task-specific information by adding additional trainable modules or parameters to the original model. - **Partial-based Tuning**: Includes Specification Tuning and Reparameter Tuning. These methods focus on updating a small portion of the inherent parameters of the model while keeping most parameters unchanged. - **Unified-based Tuning**: Proposes a unified framework that integrates different fine-tuning methods into a coordinated architecture to improve overall efficiency and effectiveness. 3. **Application Introduction**: Introduces application cases of PEFT methods in real-world scenarios. 4. **Future Challenges**: Points out potential challenges and research directions in the PEFT field. Through this paper, the authors aim to provide a systematic review and the latest progress overview of PEFT methods in the vision field to promote the development of this area.