How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training

Jaeseong You,Minseop Park,Kyunggeun Lee,Seokjun An,Chirag Patel,Markus Nage
2024-04-25
Abstract:This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training: (1) scale and offset, (2) minimum and maximum, and (3) beta and gamma. We perform a comprehensive comparative analysis of these parameterizations' influence on quantization-aware training, using both controlled experiments and real-world large language models. Our particular focus is on their changing behavior in response to critical training hyperparameters, bit width and learning rate. Based on our investigation, we propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper explores the optimization of parameterized methods for asymmetric quantization range in Quantization-Aware Training (QAT). Three different parameterization approaches, namely scale and offset, min and max, and beta and gamma, are studied. By analyzing their behavior during the training process, the paper presents the best practices for stabilizing and accelerating QAT. Specifically, it points out that scale and offset parameterization may be unstable with high bit widths, while min and max parameterization are more robust to bit widths and learning rates. Additionally, beta and gamma parameterization performs well in handling extremely low-bit quantization, especially in cases requiring fast convergence. The paper also provides experimental results to support these findings and emphasizes the impact of different parameterizations on QAT performance.