Advancing Matrix Decomposition Efficiency: A Study on FT-Matrix DSP Based SVD Optimization.

Anxing Xie,Yonghua Hu,Aobo Cheng,Zhuoyou Tang,Peng Liu,Xin Zhang
DOI: https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00085
2023-01-01
Abstract:Matrix decomposition is a fundamental operation in linear algebra, and it has various applications in machine learning, signal processing, edge computing, and many other fields. Singular Value Decomposition (SVD) is a matrix decomposition method that can break down a matrix into three matrices: two orthogonal matrices and a diagonal matrix. With the development of domestic high-performance Digital Signal Value Processors (DSP), the demand for matrix computation based on DSP platforms is increasing. The research of SVD implemented based on DSP is important and meaningful. However, accessing the high-performance algorithm requires developers who are familiar with the hardware characteristics, in order to combine the unique features of the algorithm with the limited hardware resources. To reduce the cost of computing the SVD in matrix, we implement a vectorization mapping method for the SVD algorithm on the FT-M7002. The single instruction multiple data (SIMD) instructions embedded in the FT-M7002 processor were utilized to exploit the data-level parallelism in the SVD algorithm. Instead of using data movement and a scalar processing unit (SPU), we compute with a single vector processing element (VPE). Additionally, DMA transfer algorithm is designed to implement matrix transposition and resolve the issue of discontinuous data access. Experimental results show that the optimized SVD algorithm improves execution performance relative to the original SVD algorithm on FT by up to 5.0 ×. Furthermore, we demonstrate that the optimized SVD algorithm on the FT-M7002 performs 1.0-2.0× faster than the optimized SVD algorithm on TMS320C6678 processor.
What problem does this paper attempt to address?