RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy

Geonho Lee,Janghwan Lee,Sukjin Hong,Minsoo Kim,Euijai Ahn,Du-Seong Chang,Jungwook Choi
2024-12-02
Abstract:Low-rank adaptation (LoRA) has become the dominant method for parameter-efficient LLM fine-tuning, with LoRA-based quantization error compensation (LQEC) emerging as a powerful tool for recovering accuracy in compressed LLMs. However, LQEC has underperformed in sub-4-bit scenarios, with no prior investigation into understanding this limitation. We propose RILQ (Rank-Insensitive LoRA-based Quantization Error Compensation) to understand fundamental limitation and boost 2-bit LLM accuracy. Based on rank analysis revealing model-wise activation discrepancy loss's rank-insensitive nature, RILQ employs this loss to adjust adapters cooperatively across layers, enabling robust error compensation with low-rank adapters. Evaluations on LLaMA-2 and LLaMA-3 demonstrate RILQ's consistent improvements in 2-bit quantized inference across various state-of-the-art quantizers and enhanced accuracy in task-specific fine-tuning. RILQ maintains computational efficiency comparable to existing LoRA methods, enabling adapter-merged weight-quantized LLM inference with significantly enhanced accuracy, making it a promising approach for boosting 2-bit LLM performance.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of accuracy degradation of large - language models (LLMs) under low - precision quantization, especially 2 - bit quantization. Specifically, although the low - rank adaptation (LoRA) and LoRA - based quantization error compensation (LQEC) methods perform well in quantization of 4 bits and above, the effectiveness of these methods drops significantly in 2 - bit quantization. The paper points out that existing methods require a higher rank to effectively compensate for errors in 2 - bit quantization, which contradicts the low - rank premise of LoRA. ### Main contributions of the paper 1. **Propose the RILQ method**: - RILQ (Rank - Insensitive LoRA - based Quantization Error Compensation) is a new quantization error compensation method, aiming to overcome the high - rank requirement in 2 - bit quantization through the model - level loss function (Model - Loss). - RILQ achieves more effective quantization error compensation by global adapter adjustment and balancing compensation between different layers. 2. **Analyze the characteristics of quantization errors**: - The paper proves through experiments that the errors introduced by 2 - bit quantization are essentially high - rank, and the existing SVD - based low - rank adaptation techniques are difficult to deal with effectively. - It proposes rank - sensitivity analysis, revealing the impact of the quantization error range on the performance of LQEC. 3. **Experimental verification**: - The effectiveness of RILQ is evaluated on multiple benchmark datasets, including common question - answering tasks (such as WinoGrande, PIQA, Hellaswag, etc.) and arithmetic reasoning tasks (GSM8K). - The experimental results show that RILQ significantly improves the model accuracy under 2 - bit quantization and also performs excellently in task - specific fine - tuning. ### Formula representation The formulas involved in the paper are as follows: - **Quantization weight formula**: \[ Q_b = s\cdot\text{clamp}\left(\left\lfloor\frac{W}{s}\right\rfloor - z,0,2^N - 1\right)+z \] where \[ s=\frac{\gamma\max(W)-\beta\min(W)}{2^b - 1},\quad z = \left\lfloor\frac{\beta\min(W)}{s}\right\rfloor \] - **LoRA forward operation**: \[ Y = X(W + L_1L_2^T) \] - **Optimization objective**: \[ \arg\min_{L_1,L_2}\|Y_N - Y_q^N\|_F \] where \(Y_N\) is the full - precision activation output and \(Y_q^N\) is the quantized activation output. Through these improvements, RILQ can significantly improve the accuracy of LLM under 2 - bit quantization while maintaining computational efficiency.