Optimizing Residue Number System on FPGA

Jiahe Liu,Bangtian Liu,Haohuan Fu
DOI: https://doi.org/10.1109/ithings-greencom-cpscom-smartdata.2016.137
2016-01-01
Abstract:Originated from Chinese Remainder Theorem in the 4th century AD, Residue Number System (RNS) has been regarded as a promising number representation method in the field of digital computer arithmetic. Even though reconfigurable hardware devices such as Field Programmable Gate Arrays (FPGA) have been a popular platform for RNS applications due to its feasible architectural features, work that explores highly efficient RNSs through FPGA devices is still rare to be seen. In this paper, we replace the most commonly used moduli set {2 n - 1, 2 n , 2 n + 1} with {2 n - 1, 2 2n , 2 n + 1}, and explore performance potential of the whole RNS system by optimizing the residue arithmetic units and deploying a highly-efficient RNS on the FPGA platform. Furthermore, we develop a user-controlled FPGA-based RNS library generator for the moduli set {2 n - 1, 2 2n , 2 n + 1}. Our RNS reduces the latency and LUT cost by up to 20% and 42% respectively for a large number of additions, and saves up to 45% DSP cost than large bit-width binary forms for multiplications. It is demonstrated through several experiments that, for the computation-intensive applications involving a large bit-width setting or a large number of calculations, RNS has a better performance than binary forms. Our optimized RNS design is applicable with other RNS applicationslatform for RNS applications due to its feasible architectural features, work that explores highly efficient RNSs through FPGA devices is still rare to be seen. In this paper, we replace the most commonly used moduli set {2 n - 1, 2 n , 2 n + 1} with {2 n - 1, 2 2n , 2 n + 1}, and explore performance potential of the whole RNS system by optimizing the residue arithmetic units and deploying a highly-efficient RNS on the FPGA platform. Furthermore, we develop a user-controlled FPGA-based RNS library generator for the moduli set {2 n - 1, 2 2n , 2 n + 1}. Our RNS reduces the latency and LUT cost by up to 20% and 42% respectively for a large number of additions, and saves up to 45% DSP cost than large bit-width binary forms for multiplications. It is demonstrated through several experiments that, for the computation-intensive applications involving a large bit-width setting or a large number of calculations, RNS has a better performance than binary forms. Our optimized RNS design is applicable with other RNS applications.
What problem does this paper attempt to address?