Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out Dataflow
Yu Qian,Liang Zhao,Fanzi Meng,Xiapeng Xu,Cheng Zhuo,Xunzhao Yin
DOI: https://doi.org/10.1109/tvlsi.2024.3409648
2024-08-28
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Convolutional neural networks (ConvNets) have long been the model of choice for computer vision (CV) problems and gained renewed traction lately. In order to compute ConvNets more efficiently, process-in-memory (PIM) architectures based on emerging non-volatile memories (NVMs) such as RRAM have been widely studied. However, conventional NVM-based PIM suffered from various non-idealities including IR drop, sneak-path currents, large analog-to-digital converter (ADC) overhead, device variations, circuits mismatch, and error propagation. In this work, we propose ConvFIFO, a crossbar-memory-based PIM architecture for ConvNets featuring a kernel-stationary dataflow. Through the design of FIFO-type input and output buffers, smaller row-activation parallelism, and more compact ADCs, ConvFIFO can maximize the reuse rates of inputs and partial sums to achieve a more balanced trade-off among throughput, accuracy, and area/energy consumption. Using SRAM-based FIFO as the input/output buffer, ConvFIFO achieves a systolic architecture without the need to move weight data, bypassing the limitation of NVM endurance and minimizing the movement of partial sums. Moreover, the FIFO nature of the dataflow allows flexible pipeline design and load balancing. Compared to classical NVM-based PIM architectures such as ISAAC, ConvFIFO exhibits significant performance enhancement for various ConvNet models, showing 1.66– /1.69– /4.23– /1.59– improvement in terms of energy consumption, latency, Ops/W, and Ops/s mm2, respectively. Compared to GPUs, ConvFIFO exhibits only an average accuracy loss of 1.82% during inference.
engineering, electrical & electronic,computer science, hardware & architecture