Abstract:Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning paradigm is becoming unsustainable due to high computational and storage demands. In response, researchers are exploring parameter-efficient fine-tuning (PEFT), which seeks to exceed the performance of full fine-tuning with minimal parameter modifications. This survey provides a comprehensive overview and future directions for visual PEFT, offering a systematic review of the latest advancements. First, we provide a formal definition of PEFT and discuss model pre-training methods. We then categorize existing methods into three categories: addition-based, partial-based, and unified-based. Finally, we introduce the commonly used datasets and applications and suggest potential future research challenges. A comprehensive collection of resources is available at

What problem does this paper attempt to address?

The paper primarily explores Parameter-Efficient Fine-Tuning (PEFT) methods for Pre-Trained Vision Models (PVMs) and provides a comprehensive overview and future research directions. With the development of large-scale pre-trained vision models, these models have demonstrated strong performance in various downstream computer vision tasks. However, due to their massive parameter sizes (reaching billions or even trillions), traditional full-model fine-tuning methods face high computational and storage demands, making them unsustainable. To address this challenge, researchers have proposed parameter-efficient fine-tuning methods, which achieve comparable or even better performance than full-model fine-tuning by updating only a minimal portion of the parameters. These methods leverage the strong generalization capabilities of large pre-trained models trained on rich data, assuming that most parameters can be shared in new tasks without significant modifications. The specific contributions of the paper are as follows: 1. **Definition and Classification**: First, a formal definition of parameter-efficient fine-tuning is provided, and model pre-training methods are discussed. Then, existing methods are classified into three categories: Addition-based Tuning, Partial-based Tuning, and Unified-based Tuning. 2. **Method Overview**: - **Addition-based Tuning**: Includes Adapter Tuning, Prompt Tuning, Prefix Tuning, and Side Tuning. These methods learn task-specific information by adding additional trainable modules or parameters to the original model. - **Partial-based Tuning**: Includes Specification Tuning and Reparameter Tuning. These methods focus on updating a small portion of the inherent parameters of the model while keeping most parameters unchanged. - **Unified-based Tuning**: Proposes a unified framework that integrates different fine-tuning methods into a coordinated architecture to improve overall efficiency and effectiveness. 3. **Application Introduction**: Introduces application cases of PEFT methods in real-world scenarios. 4. **Future Challenges**: Points out potential challenges and research directions in the PEFT field. Through this paper, the authors aim to provide a systematic review and the latest progress overview of PEFT methods in the vision field to promote the development of this area.

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment

Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications

See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition

Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

Effective and Efficient Few-shot Fine-tuning for Vision Transformers

Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

Towards a Unified View on Visual Parameter-Efficient Transfer Learning

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

PVP: Pre-trained Visual Parameter-Efficient Tuning

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

Visual Fourier Prompt Tuning

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

Bridging Pre-Trained Models to Continual Learning: A Hypernetwork Based Framework with Parameter-Efficient Fine-Tuning Techniques