Abstract:The astounding development of optical sensing imaging technology, coupled with the impressive improvements in machine learning algorithms, has increased our ability to understand and extract information from scenic events. In most cases, Convolution neural networks (CNNs) are largely adopted to infer knowledge due to their surprising success in automation, surveillance, and many other application domains. However, the convolution operations' overwhelming computation demand has somewhat limited their use in remote sensing edge devices. In these platforms, real-time processing remains a challenging task due to the tight constraints on resources and power. Here, the transfer and processing of non-relevant image pixels act as a bottleneck on the entire system. It is possible to overcome this bottleneck by exploiting the high bandwidth available at the sensor interface by designing a CNN inference architecture near the sensor. This paper presents an attention-based pixel processing architecture to facilitate the CNN inference near the image sensor. We propose an efficient computation method to reduce the dynamic power by decreasing the overall computation of the convolution operations. The proposed method reduces redundancies by using a hierarchical optimization approach. The approach minimizes power consumption for convolution operations by exploiting the Spatio-temporal redundancies found in the incoming feature maps and performs computations only on selected regions based on their relevance score. The proposed design addresses problems related to the mapping of computations onto an array of processing elements (PEs) and introduces a suitable network structure for communication. The PEs are highly optimized to provide low latency and power for CNN applications. While designing the model, we exploit the concepts of biological vision systems to reduce computation and energy. We prototype the model in a Virtex UltraScale+ FPGA and implement it in Application Specific Integrated Circuit (ASIC) using the TSMC 90nm technology library. The results suggest that the proposed architecture significantly reduces dynamic power consumption and achieves high-speed up surpassing existing embedded processors' computational capabilities.

Foveated image processing for faster object detection and recognition in embedded systems using deep convolutional neural networks

Towards Real-Time Object Detection on Embedded Systems.

FOVEA: Foveated Image Magnification for Autonomous Navigation

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

Efficient Object Detection Based on Masking Semantic Segmentation Region for Lightweight Embedded Processors

Differential Image-Based Scalable YOLOv7-Tiny Implementation for Clustered Embedded Systems

A Resource-Efficient Embedded Iris Recognition System Using Fully Convolutional Networks

Enhancing Lightweight Neural Networks for Small Object Detection in IoT Applications

Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection

Optimizing Face Recognition Inference with a Collaborative Edge–Cloud Network

Algorithm-Hardware Co-Optimization for Energy-Efficient Drone Detection on Resource-Constrained FPGA

DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units

An Optimized Face Recognition for Edge Computing

Semantic Segmentation Optimized for Low Compute Embedded Devices

Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving

Real-time Object Detection Towards High Power Efficiency.

Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

Fast and Accurate Object Detection in Remote Sensing Images Based on Lightweight Deep Neural Network

Real-Time Embedded Implementation of Improved Object Detector for Resource-Constrained Devices

Fast Object Detection with a Machine Learning Edge Device

Deep Learning-Based Multiple Object Visual Tracking on Embedded System for IoT and Mobile Edge Computing Applications