Abstract:The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs. Unlike LoRA, which adds extra adapters to all layers, OwLore strategically assigns higher sampling probabilities to layers with more outliers, selectively sampling only a few layers and fine-tuning their pre-trained weights. To further increase the number of fine-tuned layers without a proportional rise in memory costs, we incorporate gradient low-rank projection, further boosting the approach's performance. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory. Code is available at <a class="link-external link-https" href="https://github.com/pixeli99/OwLore" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve two major challenges in the fine - tuning process of large language models (LLMs): 1. **Trade - off between parameter efficiency and performance**: - Although parameter - efficient fine - tuning methods such as Low - Rank Adaptation (LoRA) have been widely used, these methods usually sacrifice performance, especially when compared with full - rank fine - tuning. Therefore, how to reduce memory and computational costs while maintaining high performance has become an important research direction. 2. **Effectiveness of layer sampling methods**: - Although existing layer sampling methods (such as LISA) reduce memory overhead to a certain extent, they are insufficient in choosing which layers to fine - tune. For example, LISA adopts a uniform sampling method, which may lead to performance not as expected. In addition, as the number of sampled layers increases, the memory overhead will also increase significantly. To solve these problems, the paper proposes the **Outlier - weighed Layerwise Sampled Low - Rank Projection (OwLore)** method. OwLore improves the efficiency and performance of fine - tuning through the following two main innovations: 1. **Outlier - Weighed Sampling (OWS) based on layer sampling**: - By analyzing the outlier distribution of each layer, layers containing more outliers are preferentially selected for fine - tuning. This method not only improves performance but also achieves more effective resource utilization. 2. **Gradient Low - Rank Projection**: - Combined with the gradient low - rank projection technology, the memory overhead is further reduced. Specifically, for the sampled layers, the gradient matrix is projected into a low - rank subspace, thereby significantly reducing the memory requirements in the optimization process. Through these innovations, OwLore performs well in multiple benchmark tests, not only outperforming existing fine - tuning methods in performance but also achieving a significant improvement in memory efficiency. Specifically, OwLore has achieved significant performance improvements on tasks such as Commonsense Reasoning, MMLU, and MT - Bench, and only needs 21GB of memory to complete the fine - tuning of LLaMa2 - 7B.

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Full Parameter Fine-tuning for Large Language Models with Limited Resources

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

LoRA ensembles for large language model fine-tuning

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

LoRA Learns Less and Forgets Less

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning