A Highly Customizable and Efficient Hardware Implementation for Parallel Matrix Inversion

Sultan Alqahtani,Yiqun Zhu,Qizhi Shi,Xiaolin Meng,Xinhua Wang
DOI: https://doi.org/10.1109/ICFPT56656.2022.9974569
2022-01-01
Abstract:This paper introduces an efficient and customizable FPGA-based architecture for parallel matrix inversion. The capability of the proposed customizable architecture to adapt to different matrix sizes with low latency and effective resource utilization is achieved. The hardware resource usage is optimized by re-using the same multiplication units for different calculations. The architecture uses multiple multiplication units in parallel to perform the normalization step and then re-uses them for the elimination step. The performance of the proposed architecture is enhanced by maximizing parallelism and minimizing the sequential execution time of the division unit. Compared with other related works, the implementation results show that the proposed architecture is sufficiently flexible to support different matrix sizes with high parallel computing power. Additionally, the number of clock cycles and multiplication units of the proposed architecture is reduced proportionally to the increase in matrix size. The proposed architecture has been optimized for a Zynq xc7z045 FPGA and implemented using both single and double- precision floating-point representations.
What problem does this paper attempt to address?