What problem does this paper attempt to address?

The problem that this paper attempts to solve is the acceleration of matrix multiplication calculation. Especially in fields such as deep learning and scientific computing, how to improve the computational efficiency through low - precision approximate matrix multiplication (GEMM) without significant loss of precision. Specifically, the article proposes a high - performance matrix multiplication approximation algorithm named LRAMM (Low precision approximates GEMM via RSVD), which combines mixed - precision quantized matrix multiplication and random singular value decomposition (RSVD) techniques to further improve the efficiency within the error range of low - precision matrix multiplication. ### Problem Background Matrix multiplication is widely used in multiple fields, such as deep learning, molecular dynamics simulation, and ab - initio calculation. In deep learning, the convolutional layer accounts for about 90% of the computing time, and these operations can be converted into the form of matrix multiplication. Therefore, optimizing the performance of matrix multiplication is crucial for improving the overall application performance. ### Solution To accelerate matrix multiplication calculation, this paper proposes the following solutions: 1. **Introduce several main matrix approximate multiplication methods**: Analyze their respective advantages and disadvantages. 2. **Propose an algorithm that combines the mixed - low - precision quantization strategy and the low - rank approximate multiplication**: Aims to improve the computing speed. 3. **Analyze the computational error and efficiency of low - rank approximation and low - precision calculation**: Design a series of experiments to verify the effectiveness of the proposed algorithm. ### Key Technologies - **Low - rank approximation (RSVD)**: Use the random singular value decomposition (RSVD) technique to perform low - rank approximation on matrices and reduce the computational complexity. - **Low - precision quantization**: Quantize the elements in the matrix into low - precision values (such as 4 - bit or 8 - bit integers) to reduce the amount of calculation and speed up the processing speed. - **Mixed - precision strategy**: Use quantization strategies with different precisions at different stages to balance computational efficiency and precision. ### Formula Representation - **Quantization formula**: \[ a_{\text{int}} = Q(a_{\text{fp}}, \lambda) = \text{TypeCast}(\lambda \cdot a_{\text{fp}}, \text{int}_N) \] \[ \lambda = \frac{2^{N - 1}- 1}{a_{\max}} \] \[ a_{\text{fp}} = \hat{Q}(a_{\text{int}}, \lambda) = \text{TypeCast}\left(\frac{a_{\text{int}}}{\lambda}, \text{float}\right) \] - **Low - rank approximate matrix multiplication**: \[ C' = (U_r \Sigma_r V_r^T)(W_r \Gamma_r Z_r^T) = (U_r \Sigma_r)(V_r^T W_r)(\Gamma_r Z_r^T) \] ### Summary This paper aims to optimize the computational efficiency of matrix multiplication by introducing low - rank approximation and low - precision quantization techniques, which are especially suitable for deep learning and scientific computing fields that require a large number of matrix operations. Through this method, the computing speed can be significantly improved while maintaining relatively high precision.

LRAMM -- Low precision approximates GEMM via RSVD

A method of using RSVD in residual calculation of LowBit GEMM

Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio

An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement

Technological Exploration of Rram Crossbar Array for Matrix-Vector Multiplication

Accelerating approximate matrix multiplication for near-sparse matrices on GPUs

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

Generalized Low-Rank Approximations of Matrices Revisited

Five-Precision GMRES-Based Iterative Refinement

A method for accelerating low precision operations by sparse matrix multiplication

Optimized recursive approximate multipliers for edge detection and image smoothing applications

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Cascading GEMM: High Precision from Low Precision

Accelerating Sparse Approximate Matrix Multiplication on GPUs

Precision-Aware Iterative Algorithms Based on Group-Shared Exponents of Floating-Point Numbers

Acceleration of Approximate Matrix Multiplications on GPUs

Optimization of SpGEMM with Risc-V vector instructions

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication

Permutation-Based Approximate Multiplier with High Accuracy.

A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplication