A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture

Jiawei Xu,Yuxiang Huan,Boming Huang,Haoming Chu,Yi Jin,Li-Rong Zheng,Zhuo Zou
DOI: https://doi.org/10.1109/tcsii.2020.3038897
2021-01-01
Abstract:This brief presents a memory-efficient CNN accelerator design for resource-constrained devices in Internet of Things (IoT) and autonomous systems. A segmented logarithmic (SegLog) quantization method is exploited to mitigate the on-chip memory and bandwidth requirements, thus accommodating more processing elements (PEs) in a given chip area to organize a reconfigurable multi-cluster architecture. The evaluation results show that SegLog quantization can achieve $6.4\times $ model compression with less than 2.5% accuracy loss on various CNNs. An ASIC implementation with 168 PEs configuration is validated in a 40-nm CMOS process, with 2.54 TOPs/W energy efficiency and 0.8 mm2 chip area reported. The accelerator has also been implemented on FPGA with 1512 PEs configured and 468 kB on-chip memory, achieving a 1.29 GOPs/kB memory efficiency. Compared with the state-of-the-art accelerators, our ASIC implementation enhances area efficiency and arithmetic intensity by $1.94\times $ and $5.62\times $ , while the FPGA implementation achieves the memory efficiency improvement by a factor of $2.34\times $ .
What problem does this paper attempt to address?