PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning

Zhisheng Lin,Han Fu,Chenghao Liu,Zhuo Li,Jianling Sun

2024-06-06

Abstract:Parameter-efficient fine-tuning (PEFT) has emerged as an effective method for adapting pre-trained language models to various tasks efficiently. Recently, there has been a growing interest in transferring knowledge from one or multiple tasks to the downstream target task to achieve performance improvements. However, current approaches typically either train adapters on individual tasks or distill shared knowledge from source tasks, failing to fully exploit task-specific knowledge and the correlation between source and target tasks. To overcome these limitations, we propose PEMT, a novel parameter-efficient fine-tuning framework based on multi-task transfer learning. PEMT extends the mixture-of-experts (MoE) framework to capture the transferable knowledge as a weighted combination of adapters trained on source tasks. These weights are determined by a gated unit, measuring the correlation between the target and each source task using task description prompt vectors. To fully exploit the task-specific knowledge, we also propose the Task Sparsity Loss to improve the sparsity of the gated unit. We conduct experiments on a broad range of tasks over 17 datasets. The experimental results demonstrate our PEMT yields stable improvements over full fine-tuning, and state-of-the-art PEFT and knowledge transferring methods on various tasks. The results highlight the effectiveness of our method which is capable of sufficiently exploiting the knowledge and correlation features across multiple tasks.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Problems with existing methods**: Current Parameter-Efficient Fine-Tuning (PEFT) methods typically train adapters independently for each task or distill knowledge from source tasks to downstream tasks, failing to fully utilize task-specific knowledge and the correlation between source and target tasks. 2. **Challenges in multi-task transfer learning**: Most existing methods mainly focus on leveraging shared knowledge among all source tasks while neglecting task-specific knowledge; source and target task representations are usually trained independently, leading to underutilized correlations between them; source and target task representations may be inconsistent, hindering cross-task adaptation; knowledge obtained from source tasks is only used for initialization and may become entangled with downstream tasks and gradually forgotten during fine-tuning. To address these issues, the paper proposes PEMT (Parameter-Efficient Multi-Task Transfer Learning), a parameter-efficient fine-tuning framework based on multi-task transfer learning. PEMT captures transferable knowledge from multiple source tasks through a Mixture of Experts (MoE) architecture and measures the correlation between source and target tasks using task-related gating units, thereby achieving better performance improvements. Experimental results show that PEMT significantly outperforms full fine-tuning and other PEFT methods on multiple NLP datasets and also performs well in few-shot learning scenarios.

PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning

PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging

OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme Multi-Profile Scenarios

LPT++: Efficient Training on Mixture of Long-tailed Experts

One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

Bridging Pre-Trained Models to Continual Learning: A Hypernetwork Based Framework with Parameter-Efficient Fine-Tuning Techniques

MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts

Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition

CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models