FPGA implementation for solving linear least square problem

Wang Shaojun,Liu Qi,Zhong Xuejie,Peng Xiyuan
DOI: https://doi.org/10.19650/j.cnki.cjsi.2012.03.033
2012-01-01
Abstract:Large calculation delay and poor parallelism greatly limit the solution efficiency of least square problem based on FPGA. We propose a novel approach of modified Cholesky factorization to solve this problem. With this approach, the least square problem is divided into matrix factorization part and triangle matrix solving part. The optimal parallelism is achieved by maximizing the amount of PEs (Processing Element) in each part. The calculation delay is decreased by avoiding the root operation and eliminating the division operation with modified Cholesky factorization. In triangle matrix solving part, the same PEs are used to solve both the upper triangle matrix and lower triangle matrix, which saves the FPGA resources. The experiments on Virtex XC5VFX130T FPGA with a 100 MHz clock show a speedup of 8× over a dual core CPU implementation in single-precision.
What problem does this paper attempt to address?