Abstract:The astounding development of optical sensing imaging technology, coupled with the impressive improvements in machine learning algorithms, has increased our ability to understand and extract information from scenic events. In most cases, Convolution neural networks (CNNs) are largely adopted to infer knowledge due to their surprising success in automation, surveillance, and many other application domains. However, the convolution operations' overwhelming computation demand has somewhat limited their use in remote sensing edge devices. In these platforms, real-time processing remains a challenging task due to the tight constraints on resources and power. Here, the transfer and processing of non-relevant image pixels act as a bottleneck on the entire system. It is possible to overcome this bottleneck by exploiting the high bandwidth available at the sensor interface by designing a CNN inference architecture near the sensor. This paper presents an attention-based pixel processing architecture to facilitate the CNN inference near the image sensor. We propose an efficient computation method to reduce the dynamic power by decreasing the overall computation of the convolution operations. The proposed method reduces redundancies by using a hierarchical optimization approach. The approach minimizes power consumption for convolution operations by exploiting the Spatio-temporal redundancies found in the incoming feature maps and performs computations only on selected regions based on their relevance score. The proposed design addresses problems related to the mapping of computations onto an array of processing elements (PEs) and introduces a suitable network structure for communication. The PEs are highly optimized to provide low latency and power for CNN applications. While designing the model, we exploit the concepts of biological vision systems to reduce computation and energy. We prototype the model in a Virtex UltraScale+ FPGA and implement it in Application Specific Integrated Circuit (ASIC) using the TSMC 90nm technology library. The results suggest that the proposed architecture significantly reduces dynamic power consumption and achieves high-speed up surpassing existing embedded processors' computational capabilities.

Cappuccino: Efficient Inference Software Synthesis for Mobile System-on-Chips

Neural Network Inference on Mobile SoCs

Enhancing Distributed In-Situ CNN Inference in the Internet of Things

ABM-SpConv-SIMD: Accelerating Convolutional Neural Network Inference for Industrial IoT Applications on Edge Devices

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

Automated Exploration and Implementation of Distributed CNN Inference at the Edge

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10

Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

A Reconfigurable Convolutional Neural Network-Accelerated Coprocessor Based on RISC-V Instruction Set

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks.

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation Co-Design Goes a Long Way

Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA

Smartphone-based real-time object recognition architecture for portable and constrained systems

Characterizing the Deep Neural Networks Inference Performance of Mobile Applications

CAP: Communication-aware Automated Parallelization for Deep Learning Inference on CMP Architectures