Abstract:Implementing deep learning models on edge devices presents significant challenges due to the limited computing power, memory limitations, and processing power of these devices As deep learning models become more complex, cf ensuring proper execution on an edge platform is critical for real-time implementation Hardware -Addresses these challenges by exploring ways to accelerate. The problem lies in the resource-hungry nature of today’s deep learning models, which are typically designed for cloud environments with high computing capacity, making them unsuitable for edge environments with restricted resources. The main objective of this review is to analyze and compare various hardware acceleration strategies, such as graphics processing units (GPUs), tensor processing units (TPUs), field programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs); . . . . (pruning, quantization, and knowledge storage), memory management, and data flow optimization to improve the performance and energy efficiency of deep learning models on edge devices The results of this comprehensive performance analysis suggests that hardware accelerators can significantly improve throughput and reduce latency while maintaining acceptable levels of power consumption and accuracy. In addition to techniques such as quantization and pruning that are seen to reduce computational load and memory footprint, enabling more efficient deep learning inference on edge platforms, the study highlights trade-offs between speed, consumption power efficiency and model accuracy between for each hardware accelerator are emphasized. The findings suggest that by choosing the right hardware and applying the right optimization techniques, edge devices may be able to optimize deep learning models, meeting the requirements of real-time AI applications in resource-constrained environments handle the role there.

Performance evaluation of acceleration of convolutional layers on OpenEdgeCGRA

Edge FPGA-based Onsite Neural Network Training.

A comparative study of FPGA and CGRA technologies in hardware acceleration for deep learning

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

Efficient Hardware Acceleration Techniques for Deep Learning on Edge Devices: A Comprehensive Performance Analysis

Myocarditis: A clinical entity that can benefit from noninvasive imaging

Improvements in Interlayer Pipelining of CNN Accelerators Using Genetic Algorithms

CFEACT: A CGRA-based Framework Enabling Agile CNN and Transformer Accelerator Design

Sustainable AI Processing at the Edge

Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator

A Conv‐GEMM reconfigurable accelerator with WS‐RS dataflow for high throughput processing

Specializing CGRAs for Light-Weight Convolutional Neural Networks

A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

DT-CGRA: Dual-track Coarse-Grained Reconfigurable Architecture for Stream Applications

Flip: Data-Centric Edge CGRA Accelerator

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

A high-speed reusable quantized hardware accelerator design for CNN on constrained edge device

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

Stream Processing Dual-Track CGRA for Object Inference

A Low-Power Hardware Architecture for Real-Time CNN Computing