Abstract:As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning becomes prohibitively expensive for model training and storage. In vision-and-language (VL), parameter-efficient tuning (PET) techniques are proposed to integrate modular modifications (e.g., Adapter and LoRA) into encoder-decoder PLMs. By tuning a small set of trainable parameters, these techniques perform on par with full fine-tuning. However, excessive modular modifications and neglecting the functionality gap between the encoders and decoders can lead to performance degradation, while existing PET techniques (e.g., VL-Adapter) overlook these critical issues. In this paper, we propose a Vision-and-Language Parameter-Efficient Tuning (VL-PET) framework to impose effective control over modular modifications via a novel granularity-controlled mechanism. Considering different granularity-controlled matrices generated by this mechanism, a variety of model-agnostic VL-PET modules can be instantiated from our framework for better efficiency and effectiveness trade-offs. We further propose lightweight PET module designs to enhance VL alignment and modeling for the encoders and maintain text generation for the decoders. Extensive experiments conducted on four image-text tasks and four video-text tasks demonstrate the efficiency, effectiveness and transferability of our VL-PET framework. In particular, our VL-PET-large with lightweight PET module designs significantly outperforms VL-Adapter by 2.92% (3.41%) and LoRA by 3.37% (7.03%) with BART-base (T5-base) on image-text tasks. Furthermore, we validate the enhanced effect of employing our VL-PET designs on existing PET techniques, enabling them to achieve significant performance improvements. Our code is available at <a class="link-external link-https" href="https://github.com/HenryHZY/VL-PET" rel="external noopener nofollow">this https URL</a>.

Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models

Arbitrary Few Parameters Are Good Enough for Adapting Large-scale Pre-trained Language Models

ConPET: Continual Parameter-Efficient Tuning for Large Language Models

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

ADT: an Additive Delta-Tuning Approach for Parameter-Efficient Tuning in Pre-Trained Language Models

Scaled Prompt-Tuning for Few-Shot Natural Language Generation

Parameter-efficient fine-tuning of large-scale pre-trained language models

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

Rethinking Efficient Tuning Methods from a Unified Perspective

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

When Parameter-efficient Tuning Meets General-purpose Vision-language Models

Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

PETapter: Leveraging PET-style classification heads for modular few-shot parameter-efficient fine-tuning

Small Pre-trained Language Models Can Be Fine-tuned As Large Models Via Over-Parameterization.

Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

PVP: Pre-trained Visual Parameter-Efficient Tuning