A 16.41 TOPS/W CNN Accelerator with Event-Based Layer Fusion for Real-Time Inference

Jiawei Wang,Li Lun,Zhenhui Dai,Yuanyuan Jiang,Xiaoxin Cui
DOI: https://doi.org/10.1109/iscas58744.2024.10558289
2024-01-01
Abstract:This paper proposes a convolutional neural network (CNN) accelerator architecture for real-time tasks in edge devices. An event-based layer fusion technique is adopted to eliminate on-chip storage requirements and off-chip data movement caused by features. Cross-layer pipeline is elaborated during layer fusion to obtain high throughput and low latency. An adaptive fully unrolling event-driven core is designed and a cyclic storage method is exploited to reduce the storage space for partial sum in the core. Modified LeNet is accelerated with the proposed architecture. The accelerator can reach an energy efficiency of 16.41 TOPS/NV and a latency of 0.85us under TSMC 28nm technology, and a frame rate of 369.4K FPS under FPGA.
What problem does this paper attempt to address?