Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

Yang Liu,Yiheng Zhang,Xiaoran Hao,Lan Chen,Mao Ni,Ming Chen,Rong Chen
DOI: https://doi.org/10.3390/electronics13050975
IF: 2.9
2024-03-05
Electronics
Abstract:Convolutional neural networks have been widely applied in the field of computer vision. In convolutional neural networks, convolution operations account for more than 90% of the total computational workload. The current mainstream approach to achieving high energy-efficient convolution operations is through dedicated hardware accelerators. Convolution operations involve a significant amount of weights and input feature data. Due to limited on-chip cache space in accelerators, there is a significant amount of off-chip DRAM memory access involved in the computation process. The latency of DRAM access is 20 times higher than that of SRAM, and the energy consumption of DRAM access is 100 times higher than that of multiply–accumulate (MAC) units. It is evident that the "memory wall" and "power wall" issues in neural network computation remain challenging. This paper presents the design of a hardware accelerator for convolutional neural networks. It employs a dataflow optimization strategy based on on-chip data reordering. This strategy improves on-chip data utilization and reduces the frequency of data exchanges between on-chip cache and off-chip DRAM. The experimental results indicate that compared to the accelerator without this strategy, it can reduce data exchange frequency by up to 82.9%.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the "memory wall" and "power consumption wall" problems faced by Convolutional Neural Networks (CNNs) during the calculation process. Specifically, since convolution operations account for more than 90% of the total CNN calculations, and these operations involve a large amount of weight data and input feature map data, it is necessary to frequently read data from off - chip DRAM when calculating in the accelerator. The latency and energy consumption of DRAM access are much higher than those of on - chip SRAM and multiply - accumulate (MAC) units, which not only reduces the calculation speed but also significantly increases the power consumption. To solve these problems, the paper proposes a hardware accelerator design method based on on - chip data re - ordering. This method improves the utilization rate of on - chip data by optimizing the data flow strategy and reduces the data exchange frequency between the on - chip cache and the off - chip DRAM. The experimental results show that compared with accelerators without this strategy, this method can reduce the data exchange frequency by up to 82.9%. In short, this paper aims to improve the energy efficiency and computational efficiency of CNN accelerators by improving the data reuse strategy, thereby reducing the dependence on off - chip memory.