A Low Latency High Throughput Multiply-accumulator Unit for Float Point and Integer

Jun SHEN,Hai-bin SHEN,Yu-long YU
DOI: https://doi.org/10.3969/j.issn.1000-3428.2013.06.018
2013-01-01
Abstract:To solve data hazards in vector dot product operations of float point unit, a low-latency and single-cycle accumulator architecture is used in the 7-stage pipelined configurable multiply-accumulator design. It is compatible with double-precision floating point, dual single-precision floating point and 32 bit signed integer operands. Fused multiply-add operations and continuous multiply-accumulation operations are supported. In addition, energy control is achieved using operand isolation and clock gating. Implementation result on Viterx-4 shows that the accumulator architecture has high performance, low latency and single-cycle throughput, the area and time product is 30%less than which is designed by using Xilinx Float IPs.
What problem does this paper attempt to address?