Matrix Multiplication Based on Scalable Macro-Pipelined FPGA Accelerator Architecture

Jiang Jiang,Vincent Mirian,Kam Pui Tang,Paul Chow,Zuocheng Xing
DOI: https://doi.org/10.1109/ReConFig.2009.30
2009-01-01
Abstract:In this paper, we introduce a scalable macro-pipelined architecture to perform floating point matrix multiplication, which aims to exploit temporal parallelism and architectural scalability. We demonstrate the functionality of the hardware design with 16 processing elements (PEs) on Xilinx ML507 development board containing Virtex-5 XC5VFX70T. A 32-PE design for matrix size ranging from 32*32 to 1024*1024 is also simulated. Our experiment shows that we have achieved 12.18 GFLOPS with 32 PEs or about 1.90 GFLOPS per PE per GHz performance, which is over 95% PE usage. Moreover, the proposed SMPA has the capability to scale up to tens or hundreds of GFLOPS using multiple FPGA devices and high speed interconnect.
What problem does this paper attempt to address?