Abstract:As a key technology of enabling Artificial Intelligence (AI) applications in 5G era, Deep Neural Networks (DNNs) have quickly attracted widespread attention. However, it is challenging to run computation-intensive DNN-based tasks on mobile devices due to the limited computation resources. What's worse, traditional cloud-assisted DNN inference is heavily hindered by the significant wide-area network latency, leading to poor real-time performance as well as low quality of user experience. To address these challenges, in this paper, we propose Edgent, a framework that leverages edge computing for DNN collaborative inference through device-edge synergy. Edgent exploits two design knobs: (1) DNN partitioning that adaptively partitions computation between device and edge for purpose of coordinating the powerful cloud resource and the proximal edge resource for real-time DNN inference; (2) DNN right-sizing that further reduces computing latency via early exiting inference at an appropriate intermediate DNN layer. In addition, considering the potential network fluctuation in real-world deployment, Edgent is properly design to specialize for both static and dynamic network environment. Specifically, in a static environment where the bandwidth changes slowly, Edgent derives the best configurations with the assist of regression-based prediction models, while in a dynamic environment where the bandwidth varies dramatically, Edgent generates the best execution plan through the online change point detection algorithm that maps the current bandwidth state to the optimal configuration. We implement Edgent prototype based on the Raspberry Pi and the desktop PC and the extensive experimental evaluations demonstrate Edgent's effectiveness in enabling on-demand low-latency edge intelligence.

Optimization and Deployment of DNNs for RISC-V-based Edge AI

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Enabling Deep Learning on Edge Devices

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing

Scaling Up Deep Neural Network Optimization for Edge Inference

Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference

Research on LLM Acceleration Using the High-Performance RISC-V Processor "Xiangshan" (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices