Abstract:Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer learning. More importantly, when the available data size gets smaller, our method outperforms other strong baselines by a large margin. Based on our extensive empirical experiments across various datasets, we demonstrate that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper attempts to address two main issues in multi-task tuning: 1. **Trade-off between parameter efficiency and task adaptability**: Existing parameter-efficient tuning methods typically adapt to each task independently without considering knowledge transfer between tasks. This leads to poor performance in low-data scenarios. 2. **Generalization to new tasks**: Existing methods require specific task prior knowledge or rely on the knowledge of pre-trained models when facing new tasks, which limits their generalization ability. To overcome these issues, the authors propose a Prototype-based HyperAdapter (PHA), a novel framework that combines adapter-tuning and hypernetwork. PHA achieves multi-task learning and generalization to new tasks in a sample-efficient manner by introducing an instance-dense retriever and a prototypical hypernetwork. ### Main Contributions 1. **Sample-efficient multi-task learning**: PHA can achieve performance improvements comparable to existing parameter-efficient tuning methods while maintaining parameter efficiency, especially performing better in low-data scenarios. 2. **Generalization to new tasks**: PHA effectively generalizes to new tasks by maintaining semantic feature prototypes of previous tasks and matching corresponding prototypes in new tasks. 3. **Experimental validation**: The authors conducted extensive experiments on multiple NLP datasets, validating the effectiveness of PHA in multi-task learning and few-shot transfer learning. ### Method Overview The main components of PHA include: 1. **Instance-dense retriever**: Used to distinguish instances of different tasks in the embedding space, trained with the InfoNCE loss function to cluster instances of the same task together and separate instances of different tasks. 2. **Prototypical hypernetwork**: Estimates task-specific prototypes at the instance level and trains these prototypes as embedding vectors. The hypernetwork uses these prototypes to generate task-specific adapter layer parameters. ### Experimental Results 1. **Multi-task adaptation**: On the GLUE and SuperGLUE benchmarks, PHA outperforms existing multi-task tuning methods in terms of parameter efficiency and performance. 2. **Low-data adaptation**: In scenarios with limited data, PHA significantly outperforms other baseline methods, especially when the data volume is very small. 3. **Few-shot adaptation**: PHA excels in few-shot transfer learning, effectively generalizing to new tasks even with a very small number of training samples. ### Conclusion By introducing an instance-dense retriever and a prototypical hypernetwork, PHA effectively addresses the issues of parameter efficiency and generalization to new tasks in multi-task tuning, providing a new solution for multi-task learning and few-shot transfer learning.

Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning

Parameter-Efficient Fine-Tuning With Adapters

Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme Multi-Profile Scenarios

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

HyperTuning: Toward Adapting Large Language Models without Back-propagation

G-Adapter: Towards Structure-Aware Parameter-Efficient Transfer Learning for Graph Transformer Networks

Multi-Head Adapter Routing for Cross-Task Generalization

Adapter Tuning with Task-Aware Attention Mechanism

Parameter-Efficient Transfer Learning for NLP

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

PETapter: Leveraging PET-style classification heads for modular few-shot parameter-efficient fine-tuning

FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing

OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

Parameter Efficient Instruction Tuning: An Empirical Study

PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context

Rethinking Efficient Tuning Methods from a Unified Perspective

One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning

PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning