An Area-Efficient CNN Accelerator Supporting Global Average Pooling with Arbitrary Shapes

Yichuan Bai,Xiaopeng Zhang,Qian Wang,Jingjing Lv,Lei Chen,Yuan Du,Li Du
DOI: https://doi.org/10.1109/aicas59952.2024.10595877
2024-01-01
Abstract:Integrating dedicated convolution neural network (CNN) accelerators within the processing chips has been a common solution for efficient CNN inference in internet-of-thing (IoT) devices. Fully in-accelerator processing of different computational layers is essential to support a wide range of CNN models. However, previous works lack in-depth discussion for hardware implementation of global average pooling (GAP) layers, which are widely used in classification models. This paper proposes a novel CNN accelerator with high area efficiency for event-driven IoT applications. Fully in-accelerator processing is supported for popular CNN models, such as MobileNet V2 and ResNet34. GAP layers with arbitrary shapes are also supported by software-hardware co-design to enable the low-cost deployment of customized CNN models. Compared with the reference, the proposed design reduces the gate count of the pooling module by 45.1% and achieves a 13.2% area-efficiency improvement of the overall CNN accelerator with negligible accuracy loss.
What problem does this paper attempt to address?