Abstract:It has become a popular paradigm to transfer the knowledge of large-scale pre-trained models to various downstream tasks via fine-tuning the entire model parameters. However, with the growth of model scale and the rising number of downstream tasks, this paradigm inevitably meets the challenges in terms of computation consumption and memory footprint issues. Recently, Parameter-Efficient Fine-Tuning (PEFT) (e.g., Adapter, LoRA, BitFit) shows a promising paradigm to alleviate these concerns by updating only a portion of parameters. Despite these PEFTs having demonstrated satisfactory performance in natural language processing, it remains under-explored for the question of whether these techniques could be transferred to graph-based tasks with Graph Transformer Networks (GTNs). Therefore, in this paper, we fill this gap by providing extensive benchmarks with traditional PEFTs on a range of graph-based downstream tasks. Our empirical study shows that it is sub-optimal to directly transfer existing PEFTs to graph-based tasks due to the issue of feature distribution shift. To address this issue, we propose a novel structure-aware PEFT approach, named G-Adapter, which leverages graph convolution operation to introduce graph structure (e.g., graph adjacent matrix) as an inductive bias to guide the updating process. Besides, we propose Bregman proximal point optimization to further alleviate feature distribution shift by preventing the model from aggressive update. Extensive experiments demonstrate that G-Adapter obtains the state-of-the-art performance compared to the counterparts on nine graph benchmark datasets based on two pre-trained GTNs, and delivers tremendous memory footprint efficiency compared to the conventional paradigm.

Parameter-Efficient Transfer Learning for NLP

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding

Parameter-Efficient Fine-Tuning With Adapters

Towards a Unified View of Parameter-Efficient Transfer Learning

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Parameter-efficient Tuning for Large Language Model Without Calculating Its Gradients

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks

A new computationally efficient method to tune BERT networks – transfer learning

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

AdapterGNN: Parameter-Efficient Fine-Tuning Improves Generalization in GNNs

Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters

Parameter-efficient Weight Ensembling Facilitates Task-level Knowledge Transfer.

Towards Optimal Adapter Placement for Efficient Transfer Learning

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm