Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most

Yiming Fang,Li Chen,Yunfei Chen,Weidong Wang
2024-12-04
Abstract:Mixed-precision quantization offers superior performance to fixed-precision quantization. It has been widely used in signal processing, communication systems, and machine learning. In mixed-precision quantization, bit allocation is essential. Hence, in this paper, we propose a new bit allocation framework for mixed-precision quantization from a search perspective. First, we formulate a general bit allocation problem for mixed-precision quantization. Then we introduce the penalized particle swarm optimization (PPSO) algorithm to address the integer consumption constraint. To improve efficiency and avoid iterations on infeasible solutions within the PPSO algorithm, a greedy criterion particle swarm optimization (GC-PSO) algorithm is proposed. The corresponding convergence analysis is derived based on dynamical system theory. Furthermore, we apply the above framework to some specific classic fields, i.e., finite impulse response (FIR) filters, receivers, and gradient descent. Numerical examples in each application underscore the superiority of the proposed framework to the existing algorithms.
Signal Processing,Information Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to effectively allocate bits in mixed - precision quantization to optimize the balance between performance and complexity?** Specifically, the author proposes a search - based bit - allocation framework to solve the integer - constraint problem in mixed - precision quantization, thereby improving performance in fields such as signal processing, communication systems, and machine learning. ### Main problem background Traditional quantization methods usually use fixed - low - precision quantization. Although this method is simple, it is not always optimal. Different input data have different requirements for the number of quantization bits. Therefore, using a unified number of quantization bits may lead to poor performance. Mixed - precision quantization (i.e., allocating different numbers of quantization bits for different inputs) can more effectively utilize limited resources and achieve a better balance between performance and complexity. ### Specific problems in the paper 1. **Bit - allocation problem under integer constraints**: - The author formalizes the bit - allocation problem in mixed - precision quantization as an optimization problem with integer constraints. - The specific problem is expressed as: \[ (P1) \quad \min_{\{b_n\}_{n = 1}^N} F(b) \] \[ \text{s.t.} \quad C(b)\leq C(\bar{b}), \] \[ b_n\in B, \quad n = 1,2,\ldots,N, \] where \(F(b)\) is the objective function, \(C(b)\) is the consumption function, \(\bar{b}\) is the average number of quantization bits, and \(B\subseteq\mathbb{Z}\) is the set of allowed quantization bits. 2. **Design of an efficient search algorithm**: - Since directly applying the classical particle swarm optimization (PSO) algorithm cannot handle the integer - constraint problem, the author proposes two improved PSO algorithms: particle swarm optimization with a penalty term (PPSO) and particle swarm optimization based on the greedy criterion (GC - PSO) to effectively solve the above problem. 3. **Specific application scenarios**: - The author applies the proposed bit - allocation framework to multiple classical fields, including finite impulse response (FIR) filter design, receiver design, and the gradient descent algorithm, demonstrating its superiority. ### Summary By introducing a new bit - allocation framework and improved PSO algorithms, this paper aims to solve the integer - constraint problem in mixed - precision quantization, thereby achieving more efficient performance optimization in fields such as signal processing, communication systems, and machine learning.