ON-CNN: Low Latency and High Throughput Online Arithmetic-Based Convolutional Neural Network Accelerator

Muhammad Akmal Shafique,Jeong-A Lee
DOI: https://doi.org/10.1109/access.2024.3502665
IF: 3.9
2024-12-04
IEEE Access
Abstract:Online arithmetic also known as most-significant-digit-first (MSDF) computing is becoming a choice for hardware implementation of different machine learning algorithms. It enables the generation of MSD as the first output digit and usage in consecutive computations in digit serial manner leading to different benefits like low latency, high throughput, variable precision and early termination etc. Recently, the demand for real time implementations of convolutional neural network (CNN) accelerator has dramatically increased. Latency and throughput of the CNN accelerator have paramount importance in real time object detection in video/image classification tasks. In this paper, we present online arithmetic based CNN accelerator architecture (ON-CNN), which exploits the computational capabilities of online arithmetic to achieve lowest latency and highest throughput without affecting accuracy. It also employs a re-configurable power saving strategy which saves significant percentage of power in execution of different convolution layers. ON-CNN has two versions: one whose processing element (PE) takes input in serial-serial manner (ON-CNNSS) and second whose PE takes input in serial-parallel manner (ON-CNNSP). ON-CNNSS achieves speed-up in execution time from 145.6 to 5.7 times in comparison with state-of-the-art CNN accelerator architectures. Whereas, ON-CNNSP achieves speed-up in execution time from 162.0 to 6.4 times in comparison with state-of-the-art CNN accelerator architectures. ON-CNNSS and ON-CNNSP show the highest peak-throughput of 2.276 TOPS and 2.535 TOPS respectively for VGG-16 and ResNet-50 model. The proposed re-configurable power saving strategy saves overall 19.2%, 33.8% and 18.9% power while executing VGG-16, ResNet-18 and ResNet-50 models, respectively. Comparison using metrics like area efficiency (TOPS/mm2) and power efficiency (TOPS/W) shows that ON-CNNSS and ON-CNNSP perform better than state-of-the-art CNN accelerator architectures.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?