A method for accelerating low precision operations by sparse matrix multiplication

Hongyaoxing Gu
2024-03-12
Abstract:In recent years, the fervent demand for computational power across various domains has prompted hardware manufacturers to introduce specialized computing hardware aimed at enhancing computational capabilities. Particularly, the utilization of tensor hardware supporting low precision has gained increasing prominence in scientific research. However, the use of low-precision tensor hardware for computational acceleration often introduces errors, posing a fundamental challenge of simultaneously achieving effective acceleration while maintaining computational accuracy.
Numerical Analysis
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of how to achieve efficient acceleration while maintaining the accuracy of calculation results when performing computational acceleration on low - precision tensor hardware. Specifically, the author proposes the following key problems and solutions: 1. **Error control in low - precision calculation**: - Using low - precision (such as 8 - bit integers, 4 - bit integers, etc.) for matrix multiplication operations can significantly improve the calculation speed, but it will introduce quantization errors, which will affect the accuracy of the calculation results. - The paper proposes a hybrid precision quantization method (hybrid precision quantization), combined with residual compensation quantization (residual compensation quantization) to reduce quantization errors. 2. **Application of sparse matrices**: - Sparse matrices are introduced to reduce the computational complexity. By only focusing on values that may have a significant impact on the relative error, the computational complexity can be reduced while controlling the quantization error. - Specifically, the author uses sparse matrix multiplication (sparse matrix multiplication) to replace dense matrix multiplication (dense matrix multiplication) to reduce unnecessary calculations. 3. **Acceleration of low - precision matrix multiplication**: - A threshold - based method is proposed to control the amount of calculation in low - precision matrix multiplication, ensuring efficient calculation within an acceptable error range. - A high - performance low - precision quantization algorithm is designed, and the effectiveness of this algorithm is verified through a series of experiments. ### Formula summary - **Quantization formula**: \[ a_{\text{int}} = Q(a_{\text{fp}}, \lambda) = \text{TypeCast}(\lambda \cdot a_{\text{fp}}, \text{int}_N) \] \[ \lambda = \frac{2^{N - 1}-1}{a_{\text{max}}} \] - **De - quantization formula**: \[ a_{\text{fp}} = \tilde{Q}(a_{\text{int}}, \lambda) = \text{TypeCast}\left(\frac{a_{\text{int}}}{\lambda}, \text{float}_N\right) \] - **Low - precision matrix multiplication**: \[ M_{\text{fp32}} = A_{\text{fp32}}\cdot B_{\text{fp32}} = \frac{A_{\text{int}}\cdot B_{\text{int}}}{\lambda_M} \] \[ \lambda_M=\lambda_A\cdot\lambda_B \] - **Residual compensation matrix multiplication**: \[ C_{\text{fp}}=\frac{A_{\text{int}}\cdot B_{\text{int}}}{\lambda_A\cdot\lambda_B}+\frac{A_{\text{int}}\cdot R_{B,\text{int}}}{\lambda_A\cdot\lambda_{R_B}}+\frac{R_{A,\text{int}}\cdot B_{\text{int}}}{\lambda_{R_A}\cdot\lambda_B} \] Through these methods, the paper successfully achieves efficient acceleration of low - precision calculations while ensuring the accuracy of the calculation results.