PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization
Xiangdi Meng,Damai Dai,Weiyao Luo,Zhe Yang,Shaoxiang Wu,Xiaochen Wang,Peiyi Wang,Qingxiu Dong,Liang Chen,Zhifang Sui
DOI: https://doi.org/10.48550/arxiv.2402.16141
2024-01-01
Abstract:Supervised fine-tuning is the most common method to adapt large languagemodels (LLMs) to downstream tasks, but full fine-tuning LLMs requires massivecomputational resources. Recently, parameter-efficient fine-tuning (PEFT)methods have been widely studied due to its cost-effectiveness. LoRA is one ofthe most widely used methods, which assumes that the optimization process isessentially low-dimensional. Although LoRA fine-tuning is effective, there isstill a performance gap compared to full fine-tuning, since its weight updateis limited to low-rank matrices. In order to break the low-rank bottleneck inLoRA Optimization, we propose PeriodicLoRA (PLoRA), which accumulates low-rankupdate matrices multiple times to achieve a higher update rank. PLoRA hasmultiple training stages. During each stage, we still update only the LoRAweights. However, at the end of each stage, we unload the LoRA weights into thebackbone parameters and then reinitialize the LoRA states. Experimental resultsshow that PLoRA has stronger learning ability, approximately 1.8 times that ofLoRA's learning ability at most, but it does not increase memory usage.Further, we introduce a momentum-based unloading strategy for PLoRA to mitigatethe training instability.