PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning

Zhisheng Lin,Han Fu,Chenghao Liu,Zhuo Li,Jianling Sun
2024-06-06
Abstract:Parameter-efficient fine-tuning (PEFT) has emerged as an effective method for adapting pre-trained language models to various tasks efficiently. Recently, there has been a growing interest in transferring knowledge from one or multiple tasks to the downstream target task to achieve performance improvements. However, current approaches typically either train adapters on individual tasks or distill shared knowledge from source tasks, failing to fully exploit task-specific knowledge and the correlation between source and target tasks. To overcome these limitations, we propose PEMT, a novel parameter-efficient fine-tuning framework based on multi-task transfer learning. PEMT extends the mixture-of-experts (MoE) framework to capture the transferable knowledge as a weighted combination of adapters trained on source tasks. These weights are determined by a gated unit, measuring the correlation between the target and each source task using task description prompt vectors. To fully exploit the task-specific knowledge, we also propose the Task Sparsity Loss to improve the sparsity of the gated unit. We conduct experiments on a broad range of tasks over 17 datasets. The experimental results demonstrate our PEMT yields stable improvements over full fine-tuning, and state-of-the-art PEFT and knowledge transferring methods on various tasks. The results highlight the effectiveness of our method which is capable of sufficiently exploiting the knowledge and correlation features across multiple tasks.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Problems with existing methods**: Current Parameter-Efficient Fine-Tuning (PEFT) methods typically train adapters independently for each task or distill knowledge from source tasks to downstream tasks, failing to fully utilize task-specific knowledge and the correlation between source and target tasks. 2. **Challenges in multi-task transfer learning**: Most existing methods mainly focus on leveraging shared knowledge among all source tasks while neglecting task-specific knowledge; source and target task representations are usually trained independently, leading to underutilized correlations between them; source and target task representations may be inconsistent, hindering cross-task adaptation; knowledge obtained from source tasks is only used for initialization and may become entangled with downstream tasks and gradually forgotten during fine-tuning. To address these issues, the paper proposes PEMT (Parameter-Efficient Multi-Task Transfer Learning), a parameter-efficient fine-tuning framework based on multi-task transfer learning. PEMT captures transferable knowledge from multiple source tasks through a Mixture of Experts (MoE) architecture and measures the correlation between source and target tasks using task-related gating units, thereby achieving better performance improvements. Experimental results show that PEMT significantly outperforms full fine-tuning and other PEFT methods on multiple NLP datasets and also performs well in few-shot learning scenarios.