OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

Pengxiang Li,Lu Yin,Xiaowei Gao,Shiwei Liu
2024-10-12
Abstract:The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs. Unlike LoRA, which adds extra adapters to all layers, OwLore strategically assigns higher sampling probabilities to layers with more outliers, selectively sampling only a few layers and fine-tuning their pre-trained weights. To further increase the number of fine-tuned layers without a proportional rise in memory costs, we incorporate gradient low-rank projection, further boosting the approach's performance. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory. Code is available at <a class="link-external link-https" href="https://github.com/pixeli99/OwLore" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two major challenges in the fine - tuning process of large language models (LLMs): 1. **Trade - off between parameter efficiency and performance**: - Although parameter - efficient fine - tuning methods such as Low - Rank Adaptation (LoRA) have been widely used, these methods usually sacrifice performance, especially when compared with full - rank fine - tuning. Therefore, how to reduce memory and computational costs while maintaining high performance has become an important research direction. 2. **Effectiveness of layer sampling methods**: - Although existing layer sampling methods (such as LISA) reduce memory overhead to a certain extent, they are insufficient in choosing which layers to fine - tune. For example, LISA adopts a uniform sampling method, which may lead to performance not as expected. In addition, as the number of sampled layers increases, the memory overhead will also increase significantly. To solve these problems, the paper proposes the **Outlier - weighed Layerwise Sampled Low - Rank Projection (OwLore)** method. OwLore improves the efficiency and performance of fine - tuning through the following two main innovations: 1. **Outlier - Weighed Sampling (OWS) based on layer sampling**: - By analyzing the outlier distribution of each layer, layers containing more outliers are preferentially selected for fine - tuning. This method not only improves performance but also achieves more effective resource utilization. 2. **Gradient Low - Rank Projection**: - Combined with the gradient low - rank projection technology, the memory overhead is further reduced. Specifically, for the sampled layers, the gradient matrix is projected into a low - rank subspace, thereby significantly reducing the memory requirements in the optimization process. Through these innovations, OwLore performs well in multiple benchmark tests, not only outperforming existing fine - tuning methods in performance but also achieving a significant improvement in memory efficiency. Specifically, OwLore has achieved significant performance improvements on tasks such as Commonsense Reasoning, MMLU, and MT - Bench, and only needs 21GB of memory to complete the fine - tuning of LLaMa2 - 7B.