Abstract:As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning becomes prohibitively expensive for model training and storage. In vision-and-language (VL), parameter-efficient tuning (PET) techniques are proposed to integrate modular modifications (e.g., Adapter and LoRA) into encoder-decoder PLMs. By tuning a small set of trainable parameters, these techniques perform on par with full fine-tuning. However, excessive modular modifications and neglecting the functionality gap between the encoders and decoders can lead to performance degradation, while existing PET techniques (e.g., VL-Adapter) overlook these critical issues. In this paper, we propose a Vision-and-Language Parameter-Efficient Tuning (VL-PET) framework to impose effective control over modular modifications via a novel granularity-controlled mechanism. Considering different granularity-controlled matrices generated by this mechanism, a variety of model-agnostic VL-PET modules can be instantiated from our framework for better efficiency and effectiveness trade-offs. We further propose lightweight PET module designs to enhance VL alignment and modeling for the encoders and maintain text generation for the decoders. Extensive experiments conducted on four image-text tasks and four video-text tasks demonstrate the efficiency, effectiveness and transferability of our VL-PET framework. In particular, our VL-PET-large with lightweight PET module designs significantly outperforms VL-Adapter by 2.92% (3.41%) and LoRA by 3.37% (7.03%) with BART-base (T5-base) on image-text tasks. Furthermore, we validate the enhanced effect of employing our VL-PET designs on existing PET techniques, enabling them to achieve significant performance improvements. Our code is available at <a class="link-external link-https" href="https://github.com/HenryHZY/VL-PET" rel="external noopener nofollow">this https URL</a>.

Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization

Towards Better Parameter-Efficient Fine-Tuning for Large Language Models: A Position Paper

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: an Empirical Study

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

ADT: an Additive Delta-Tuning Approach for Parameter-Efficient Tuning in Pre-Trained Language Models

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT

VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Large language models capsule: A research analysis of In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) methods

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Parameter-efficient fine-tuning of large-scale pre-trained language models

Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models

Parameter-Efficient Fine-Tuning Enhances Adaptation of Single Cell Large Language Model for Cell Type Identification

Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs