Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

Yifei Gao,Jie Ou,Lei Wang,Yuting Xiao,Zhiyuan Xiang,Ruiting Dai,Jun Cheng
2024-06-24
Abstract:Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve state-of-the-art performance in diverse quantization settings, no matter in weight-only, weight-activation or extremely low bit scenarios. By unleashing the potential of LSI, efficient finetuning on quantized model is no longer a prohibitive problem.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily addresses the following issues: 1. **Quantization Error Compensation**: Reducing the errors generated during the quantization process by making the weights compensate for each other in layers. Specifically, a technique called "Learnable Singular Value Increment (LSI)" is introduced, which uses singular value decomposition to extract weights and makes them learnable under activation conditions, thereby helping the weights to compensate for each other. 2. **Efficient Fine-Tuning**: By introducing the LSI technique, efficient fine-tuning on quantized models is no longer a challenge. This method not only improves the performance of quantized models but also achieves significant improvements with a small amount of data. 3. **Optimizing Quantization Schemes**: Combining existing technologies, LSI achieves state-of-the-art performance under various quantization settings, whether it is weight-only quantization, weight-activation quantization, or extremely low-bit quantization. Through the above methods, the paper aims to provide a solution that can effectively handle quantization errors while maintaining high performance in different quantization scenarios.