Sparse Matrix in Large Language Model Fine-tuning

Haoze He,Juncheng Billy Li,Xuan Jiang,Heather Miller

2024-05-30

Abstract:LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap between PEFT vs. full fine-tuning (FT) while also reducing both fine-tuning computational cost and memory cost. Our Sparse Matrix Tuning (SMT) method begins by identifying the most significant sub-matrices in the gradient update, updating only these blocks during the fine-tuning process. In our experiments, we demonstrate that SMT consistently surpasses other PEFT baseline (e.g. LoRA and DoRA) in fine-tuning popular large language models such as LLaMA across a broad spectrum of tasks, while reducing the GPU memory footprint by 67% compared to FT. We also examine how the performance of LoRA and DoRA tends to plateau and decline as the number of trainable parameters increases, in contrast, our SMT method does not suffer from such issue.

Computation and Language

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily addresses the issues of computational efficiency and memory utilization during the fine-tuning of Large Language Models (LLMs). Specifically: 1. **Performance gap between Parameter-Efficient Fine-Tuning (PEFT) methods and Full Fine-Tuning (FT)**: - Currently popular PEFT methods like LoRA and its variants can effectively reduce computational costs, but they usually do not perform as well as full fine-tuning. This paper attempts to narrow this gap by proposing a new Sparse Matrix Tuning (SMT) method. 2. **Computational cost and memory overhead**: - As the scale of LLMs increases, the computational resources and memory overhead required for full fine-tuning become prohibitive. For example, fine-tuning a pre-trained LLaMA 7B model requires at least 58GB of GPU memory, making it impractical to fine-tune on consumer-grade GPUs. SMT aims to significantly reduce computational costs and memory overhead by selectively updating key sub-matrices. 3. **Performance saturation issue of low-rank adaptation methods**: - The paper finds that even with an increase in the number of trainable parameters, existing low-rank adaptation methods like LoRA and DoRA experience performance saturation or even degradation. SMT avoids this phenomenon by dynamically adjusting task-related gradient information and achieves better performance with a small number of trainable parameters. In summary, this paper aims to improve computational efficiency and memory utilization during the fine-tuning of LLMs while narrowing the performance gap between PEFT methods and full fine-tuning by proposing the SMT method.

Sparse Matrix in Large Language Model Fine-tuning

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Sparse is Enough in Fine-tuning Pre-trained Large Language Models

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

FanLoRA: Fantastic LoRAs and Where to Find Them in Large Language Model Fine-tuning

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Expanding Sparse Tuning for Low Memory Usage

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers