AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models

Baisong Li,Xingwang Wang,Haixiao Xu
2023-11-12
Abstract:Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come with significant computational and storage costs. Quantizing these models is an effective way to alleviate this issue. However, existing methods struggle to strike a balance between model accuracy and hardware efficiency. This is where we introduce AWEQ, a post-training method that requires no additional training overhead. AWEQ excels in both ultra-low-bit quantization and 8-bit weight and activation (W8A8) quantization. There is an observation that weight quantization is less challenging than activation quantization. AWEQ transfers the difficulty of activation quantization to weights using channel equalization, achieving a balance between the quantization difficulties of both, and thereby maximizing performance. We have further refined the equalization method to mitigate quantization bias error, ensuring the robustness of the model. Extensive experiments on popular models such as LLaMA and OPT demonstrate that AWEQ outperforms all existing post-training quantization methods for large models.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of achieving efficient Post-Training Quantization (PTQ) in large language models (LLMs) to reduce the computational and storage overhead of these models while maintaining their accuracy. Existing quantization methods struggle to balance model accuracy and hardware efficiency, especially when quantizing activations. To this end, the paper proposes a new post-training quantization method called AWEQ (Activation-Weight Equalization Quantization), which shifts the quantization difficulty from activations to weights and introduces dynamic statistical bias correction (BC) to improve the performance and robustness of quantization. Specifically, the main contributions of AWEQ include: 1. **Ultra-low bit quantization and 8-bit quantization**: AWEQ performs well in both ultra-low bit quantization and 8-bit weight and activation quantization (W8A8) scenarios. 2. **No additional training overhead**: AWEQ is a post-training method that does not require an additional training process. 3. **Reduced quantization error**: Through channel equalization techniques, AWEQ can reduce the waste of quantization grid points caused by outliers, thereby maximizing the retention of the original model's information. 4. **Dynamic statistical bias correction**: The introduction of the BC method corrects the bias errors introduced during the quantization process, ensuring the robustness of the model. The paper validates the effectiveness of AWEQ through extensive experiments on widely used LLaMA and OPT models, showing that AWEQ outperforms existing post-training quantization methods on multiple tasks.