Adaptive quantization with mixed-precision based on low-cost proxy

Junzhe Chen,Qiao Yang,Senmao Tian,Shunli Zhang
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447866
2024-02-28
Abstract:It is critical to deploy complicated neural network models on hardware with limited resources. This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ), which contains three key modules. The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity by using the Hessian matrix and Pareto frontier techniques. Integer linear programming is used to fine-tune the quantization across different layers. Then the low-cost proxy neural architecture search module efficiently explores the ideal quantization hyperparameters. Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models. Notably, LCPAQ achieves 1/200 of the search time compared with existing methods, which provides a shortcut in practical quantization use for resource-limited devices.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced when deploying complex neural network models on hardware with limited resources. Specifically, the paper proposes a new model quantization method - Adaptive Hybrid - Precision Model Quantization based on Low - Cost Proxy (LCPAQ), aiming to reduce the computing time and energy consumption while maintaining the accuracy of the network. Through this method, researchers hope to optimize the scale, computing requirements and accuracy of deep - learning models, especially by assigning different bit widths to different layers to achieve this goal. LCPAQ consists of three key modules: 1. **Hardware - Aware Module**: Consider the hardware - limited design to ensure that the model can be optimized for specific hardware and improve performance. 2. **Adaptive Hybrid - Precision Quantization Module**: Use the Hessian matrix and Pareto frontier techniques to evaluate the quantization sensitivity, and use Integer Linear Programming (ILP) to fine - tune the quantization of different layers. 3. **Low - Cost Proxy Neural Architecture Search Module**: Efficiently explore the ideal quantization hyper - parameters and accelerate the search process. The experimental results show that the quantization accuracy of LCPAQ on the ImageNet dataset is equivalent to or better than that of the existing hybrid - precision models. In particular, in terms of search time, LCPAQ only requires 1/200 of the existing methods, significantly improving the efficiency in practical applications.