Abstract:Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-trained models while preserving the original parameters during adaptation. In this paper, we present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. Within this framework, tuning adapters in VLMs necessitates handling heterophilic graphs, owing to the disparity between the projected query and value space. To address this challenge, we propose a new adapter architecture, $p$-adapter, which employs $p$-Laplacian message passing in Graph Neural Networks (GNNs). Specifically, the attention weights are re-normalized based on the features, and the features are then aggregated using the calibrated attention matrix, enabling the dynamic exploitation of information with varying frequencies in the heterophilic attention graphs. We conduct extensive experiments on different pre-trained VLMs and multi-modal tasks, including visual question answering, visual entailment, and image captioning. The experimental results validate our method's significant superiority over other PETL methods.

Adapter Tuning with Task-Aware Attention Mechanism

Iterative Task-adaptive Pretraining for Unsupervised Word Alignment

Experience Adapter: Adapting Pre-trained Language Models for Continual Task Planning.

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

Parameter-Efficient Fine-Tuning With Adapters

Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models

MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks

Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning

TaD: A Plug-and-Play Task-Aware Decoding Method to Better Adapt LLMs on Downstream Tasks

Lightweight Adapter Tuning for Multilingual Speech Translation

p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

MerA: Merging Pretrained Adapters for Few-Shot Learning

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

Task-Conditional Adapter for Multi-Task Dense Prediction

KronA: Parameter Efficient Tuning with Kronecker Adapter

Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters