An All-Digital Compute-In-Memory FPGA Architecture for Deep Learning Acceleration

Yonggen Li,Xin Li,Haibin Shen,Jicong Fan,Yanfeng Xu,Kejie Huang
DOI: https://doi.org/10.1145/3640469
IF: 2.837
2024-01-16
ACM Transactions on Reconfigurable Technology and Systems
Abstract:Field Programmable Gate Array (FPGA) is a versatile and programmable hardware platform, which makes it a promising candidate for accelerating Deep Neural Networks (DNNs). However, FPGA's computing energy efficiency is low due to the domination of energy consumption by interconnect data movement. In this paper, we propose an all-digital Compute-In-Memory FPGA architecture for deep learning acceleration. Furthermore, we present a bit-serial computing circuit of the Digital CIM core for accelerating vector-matrix multiplication (VMM) operations. A Network-CIM-Deployer ( NCIMD ) is also developed to support automatic deployment and mapping of DNN networks. NCIMD provides a user-friendly API of DNN models in Caffe format. Meanwhile, we introduce a Weight-Stationary (WS) dataflow and describe the method of mapping a single layer of the network to the CIM array in the architecture. We conduct experimental tests on the proposed FPGA architecture in the field of Deep Learning (DL), as well as in non-DL fields, using different architectural layouts and mapping strategies. We also compare the results with the conventional FPGA architecture. The experimental results show that compared to the conventional FPGA architecture, the energy efficiency can achieve a maximum speedup of 16.1 ×, while the latency can decrease up to in our proposed CIM FPGA architecture.
computer science, hardware & architecture
What problem does this paper attempt to address?
The paper aims to address the energy efficiency issues in deep learning acceleration, particularly for the deployment and application of Internet of Things (IoT) terminal devices. The authors propose a fully digital Compute-In-Memory (CIM) Field Programmable Gate Array (FPGA) architecture to enhance the acceleration performance of Deep Neural Networks (DNNs). The main issues include: 1. **Low energy efficiency of traditional FPGAs**: Due to the significant energy consumption of data transfer, traditional FPGAs are not highly energy-efficient for deep learning acceleration. 2. **Limitations of existing hardware platforms**: Mainstream hardware platforms such as Graphics Processing Units (GPUs), Application-Specific Integrated Circuits (ASICs), and FPGAs each have their limitations. For example, while GPUs have superior computational performance, they are constrained by the von Neumann architecture. ASICs, although energy-efficient, have high customization costs and are difficult to adapt to algorithm changes. To address the above issues, the paper proposes the following solutions: - **Fully digital CIM FPGA architecture**: Combines storage units and computing units, enabling direct execution of computational tasks in memory, significantly reducing the energy consumption caused by data movement. - **Bit-serial computing circuits**: Used to accelerate Vector-Matrix Multiplication (VMM) operations. - **NCIMD toolchain**: Supports automatic deployment and mapping of deep neural networks to the proposed CIM FPGA architecture and provides a user-friendly API to support different formats of DNN models. Through experimental testing, compared to traditional FPGA architectures, the proposed CIM FPGA architecture can improve energy efficiency by up to 16.1 times and reduce latency by up to 40%. This indicates that the architecture has significant advantages in deep learning acceleration.