Abstract:The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models Via MoE-Style Plugin.

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality

MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning

PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment

Learning Attentional Mixture of LoRAs for Language Model Continual Learning

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Orthogonal Subspace Learning for Language Model Continual Learning

MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting