LRAMM -- Low precision approximates GEMM via RSVD

Hongyaoxing Gu
2024-05-27
Abstract:Matrix multiplication computation acceleration has been a research hotspot across various domains. Due to the characteristics of some applications, approximate matrix multiplication can achieve significant performance improvements without losing much precision. In this paper, we propose LRAMM - a high-performance matrix multiplication approximation algorithm that combines mixed-precision quantized matrix multiplication with RSVD techniques, further enhancing efficiency within the error range of low-precision matrix multiplication by utilizing matrix low-rank decomposition technology.
Numerical Analysis,Performance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the acceleration of matrix multiplication calculation. Especially in fields such as deep learning and scientific computing, how to improve the computational efficiency through low - precision approximate matrix multiplication (GEMM) without significant loss of precision. Specifically, the article proposes a high - performance matrix multiplication approximation algorithm named LRAMM (Low precision approximates GEMM via RSVD), which combines mixed - precision quantized matrix multiplication and random singular value decomposition (RSVD) techniques to further improve the efficiency within the error range of low - precision matrix multiplication. ### Problem Background Matrix multiplication is widely used in multiple fields, such as deep learning, molecular dynamics simulation, and ab - initio calculation. In deep learning, the convolutional layer accounts for about 90% of the computing time, and these operations can be converted into the form of matrix multiplication. Therefore, optimizing the performance of matrix multiplication is crucial for improving the overall application performance. ### Solution To accelerate matrix multiplication calculation, this paper proposes the following solutions: 1. **Introduce several main matrix approximate multiplication methods**: Analyze their respective advantages and disadvantages. 2. **Propose an algorithm that combines the mixed - low - precision quantization strategy and the low - rank approximate multiplication**: Aims to improve the computing speed. 3. **Analyze the computational error and efficiency of low - rank approximation and low - precision calculation**: Design a series of experiments to verify the effectiveness of the proposed algorithm. ### Key Technologies - **Low - rank approximation (RSVD)**: Use the random singular value decomposition (RSVD) technique to perform low - rank approximation on matrices and reduce the computational complexity. - **Low - precision quantization**: Quantize the elements in the matrix into low - precision values (such as 4 - bit or 8 - bit integers) to reduce the amount of calculation and speed up the processing speed. - **Mixed - precision strategy**: Use quantization strategies with different precisions at different stages to balance computational efficiency and precision. ### Formula Representation - **Quantization formula**: \[ a_{\text{int}} = Q(a_{\text{fp}}, \lambda) = \text{TypeCast}(\lambda \cdot a_{\text{fp}}, \text{int}_N) \] \[ \lambda = \frac{2^{N - 1}- 1}{a_{\max}} \] \[ a_{\text{fp}} = \hat{Q}(a_{\text{int}}, \lambda) = \text{TypeCast}\left(\frac{a_{\text{int}}}{\lambda}, \text{float}\right) \] - **Low - rank approximate matrix multiplication**: \[ C' = (U_r \Sigma_r V_r^T)(W_r \Gamma_r Z_r^T) = (U_r \Sigma_r)(V_r^T W_r)(\Gamma_r Z_r^T) \] ### Summary This paper aims to optimize the computational efficiency of matrix multiplication by introducing low - rank approximation and low - precision quantization techniques, which are especially suitable for deep learning and scientific computing fields that require a large number of matrix operations. Through this method, the computing speed can be significantly improved while maintaining relatively high precision.