Unlocking the Global Synergies in Low-Rank Adapters

Zixi Zhang,Cheng Zhang,Xitong Gao,Robert D. Mullins,George A. Constantinides,Yiren Zhao
2024-06-21
Abstract:Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the efficacy of HeteroLoRA by performing the allocation in a more challenging search space that includes LoRA modules and LoRA-adapted shortcut connections. Experiments show that HeteroLoRA enables improvements in model performance given the same parameter budge. For example, on MRPC, we see an improvement of 1.6% in accuracy with similar training parameter budget. We will open-source our algorithm once the paper is accepted.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The paper primarily focuses on addressing the parameter efficiency issue during the fine-tuning process of large language models (LLMs). Specifically, the paper proposes the HeteroLoRA algorithm, a lightweight search algorithm designed to optimize the allocation of trainable parameters in Low-Rank Adaptation (LoRA) techniques. By leveraging zero-cost proxies to assess the importance of different LoRA modules, HeteroLoRA can determine which LoRA modules should be enabled or disabled under a limited parameter budget to achieve better fine-tuning performance. Additionally, the paper explores the concept of shortcut connections in LoRA adaptation and combines it with HeteroLoRA to further enhance the overall synergy of the model. Experimental results show that this approach can significantly improve model performance under the same parameter budget. For instance, on the MRPC dataset, accuracy increased by 1.6%. In summary, the problem the paper attempts to solve can be summarized as: how to effectively allocate limited trainable parameters to LoRA modules to maximize the performance of large language models on specific tasks, and further enhance model performance by introducing shortcut connections in LoRA adaptation.