Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only).

Yuxin Wang,Peng Li,Peng Zhang,Chen Zhang,Jason Cong
DOI: https://doi.org/10.1145/2435264.2435321
2013-01-01
Abstract:With the increase of data processing throughput in reconfigurable computing, data parallelism is now crucial for the performance of FPGA-based accelerators. However, most of the data parallelism optimizations are still performed manually by experienced hardware designers. Memory partitioning is widely adopted to efficiently increase the memory bandwidth by using multiple memory banks and reducing data access conflict. Previous methods for memory partitioning mainly focused on one-dimensional arrays. As a consequence, designers must flatten a multidimensional array to fit those methodologies, but it makes the partition related to the dimensional width of the array. In this work we propose an automatic memory partitioning scheme for multidimensional arrays to provide high data throughput of on-chip memories for the loop pipelining in high-level synthesis. Linear transformation is applied to optimize the layout of the data elements in the memory banks, with the partition unrelated to the dimensional width. Two transformation vectors are used to map the original data element onto different banks and different inner bank offsets. The vector for the optimal bank mapping is decided by non-conflict access constraint. In addition, a memory padding technique is proposed to find a vector for inner bank offset with a trade-off between practicality and optimality. We use six benchmarks with different access patterns to prove our idea. Compared to the previous one-dimensional partitioning work, the experimental results show that our approach saves up to 21% of block RAMs, 19% in slices, and 46% in DSPs.
What problem does this paper attempt to address?