Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

Yifei Gao,Jie Ou,Lei Wang,Fanhua Shang,Jaji Wu,Jun Cheng
2024-08-15
Abstract:Large Language Models (LLMs) showcase remarkable performance and robust deductive capabilities, yet their expansive size complicates deployment and raises environmental concerns due to substantial resource consumption. The recent development of a quantization technique known as Learnable Singular-value Increment (LSI) has addressed some of these quantization challenges. Leveraging insights from LSI and our extensive research, we have developed innovative methods that enhance the performance of quantized LLMs, particularly in low-bit settings. Our methods consistently deliver state-of-the-art results across various quantization scenarios and offer deep theoretical insights into the quantization process, elucidating the potential of quantized models for widespread application.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of error compensation in the quantization process of large language models (LLM). Specifically, although the existing Learnable Singular-value Increment (LSI) technique has made some progress in quantization, its theoretical analysis is relatively limited and it fails to fully resolve some "anomalous behavior" issues. Therefore, the paper proposes a new method—Diagonal Extension of Learnable Singular Values (DESV), which introduces additional learnable parameters to improve the quantization adjustment process of the linear weight matrix, thereby better adapting to quantization settings. This method not only achieves state-of-the-art results in various quantization scenarios but also reveals the intrinsic properties of the quantized model that perform well in downstream tasks, which existing methods have not touched upon. In short, the paper aims to: 1. Define the quantization of the linear weight matrix as an inequality solving problem and propose strategies to fully exploit its potential to achieve better quantization results. 2. Propose the DESV method, which enhances the adjustment capability of the LSI technique by adding more learnable parameters to the diagonal singular value matrix. 3. Demonstrate the superior performance of this method in low-bit quantization scenarios and reveal the performance of the quantized model in certain downstream tasks and the reasons for its side effects.