LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models Via MoE-Style Plugin.
Shihan Dou,Enyu Zhou,Yan Liu,Songyang Gao,Jun Zhao,Wei Shen,Yuhao Zhou,Zhiheng Xi,Xiao Wang,Xiaoran Fan,Shiliang Pu,Jiang Zhu,Rui Zheng,Tao Gui,Qi Zhang,Xuanjing Huang
DOI: https://doi.org/10.18653/v1/2024.acl-long.106
2024-01-01
Abstract:Supervised fine-tuning (SFT) is a crucial step for large language models(LLMs), enabling them to align with human instructions and enhance theircapabilities in downstream tasks. Increasing instruction data substantially isa direct solution to align the model with a broader range of downstream tasksor notably improve its performance on a specific task. However, we find thatlarge-scale increases in instruction data can damage the world knowledgepreviously stored in LLMs. To address this challenge, we propose LoRAMoE, anovelty framework that introduces several low-rank adapters (LoRA) andintegrates them by using a router network, like a plugin version of Mixture ofExperts (MoE). It freezes the backbone model and forces a portion of LoRAs tofocus on leveraging world knowledge to solve downstream tasks, to alleviateworld knowledge-edge forgetting. Experimental results show that, as theinstruction data increases, LoRAMoE can significantly improve the ability toprocess downstream tasks, while maintaining the world knowledge stored in theLLM.