Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning

Hao Zhao,Jie Fu,Zhaofeng He
2023-11-11
Abstract:Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer learning. More importantly, when the available data size gets smaller, our method outperforms other strong baselines by a large margin. Based on our extensive empirical experiments across various datasets, we demonstrate that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper attempts to address two main issues in multi-task tuning: 1. **Trade-off between parameter efficiency and task adaptability**: Existing parameter-efficient tuning methods typically adapt to each task independently without considering knowledge transfer between tasks. This leads to poor performance in low-data scenarios. 2. **Generalization to new tasks**: Existing methods require specific task prior knowledge or rely on the knowledge of pre-trained models when facing new tasks, which limits their generalization ability. To overcome these issues, the authors propose a Prototype-based HyperAdapter (PHA), a novel framework that combines adapter-tuning and hypernetwork. PHA achieves multi-task learning and generalization to new tasks in a sample-efficient manner by introducing an instance-dense retriever and a prototypical hypernetwork. ### Main Contributions 1. **Sample-efficient multi-task learning**: PHA can achieve performance improvements comparable to existing parameter-efficient tuning methods while maintaining parameter efficiency, especially performing better in low-data scenarios. 2. **Generalization to new tasks**: PHA effectively generalizes to new tasks by maintaining semantic feature prototypes of previous tasks and matching corresponding prototypes in new tasks. 3. **Experimental validation**: The authors conducted extensive experiments on multiple NLP datasets, validating the effectiveness of PHA in multi-task learning and few-shot transfer learning. ### Method Overview The main components of PHA include: 1. **Instance-dense retriever**: Used to distinguish instances of different tasks in the embedding space, trained with the InfoNCE loss function to cluster instances of the same task together and separate instances of different tasks. 2. **Prototypical hypernetwork**: Estimates task-specific prototypes at the instance level and trains these prototypes as embedding vectors. The hypernetwork uses these prototypes to generate task-specific adapter layer parameters. ### Experimental Results 1. **Multi-task adaptation**: On the GLUE and SuperGLUE benchmarks, PHA outperforms existing multi-task tuning methods in terms of parameter efficiency and performance. 2. **Low-data adaptation**: In scenarios with limited data, PHA significantly outperforms other baseline methods, especially when the data volume is very small. 3. **Few-shot adaptation**: PHA excels in few-shot transfer learning, effectively generalizing to new tasks even with a very small number of training samples. ### Conclusion By introducing an instance-dense retriever and a prototypical hypernetwork, PHA effectively addresses the issues of parameter efficiency and generalization to new tasks in multi-task tuning, providing a new solution for multi-task learning and few-shot transfer learning.