Abstract:International Journal of Software Engineering and Knowledge Engineering, Ahead of Print. In mobile edge computing environment, intelligent inference services driven by DNN are highly sensitive to latency. Recently, collaborative inference between User Devices and Edge Servers (ESs) based on Deep Neural Networks (DNN) partition has achieved success in service acceleration. However, most of the existing collaborative acceleration schemes are partitioned for a single DNN inference task, which cannot quickly make partition decisions for a set of concurrent inference tasks, and often sacrifice inference accuracy. In addition, due to the limited resources of ESs, there is resource competition among concurrent requests, which makes the partitioned tasks cannot be offloaded to ESs in time for processing. Therefore, designing an efficient offloading scheme becomes essential. The task offloading schemes based on deep reinforcement learning can solve complex decision-making problems in high-dimensional state space, but they have problems such as insufficient sample diversity and easily falling into local optimum. In this paper, a Collaborative Inference Acceleration Scheme integrating DNN Partitioning and Task Offloading (CIAS-PnO) is proposed. First, while ensuring inference accuracy, the Collaborative DNN Layer Partitioning (CDLP) algorithm is designed with the goal of optimal latency. CDLP can reduce the problem scale of concurrent inference tasks partition by pruning operation and determine the partition decisions in time. Then, the Distributed Soft Actor-Critic (SAC)-based Partition Task Offloading algorithm (DSACO) is designed. DSACO supports SAC Agents to explore samples in parallel and share learning experiences, and uses the automatic entropy adjustment mechanism to improve the exploration efficiency of Agents, so as to avoid falling into local optimum and achieve efficient offloading of partition tasks. Experimental results on DNN benchmarks show that compared with the baseline acceleration schemes, CIAS-PnO achieves more than 19.8% acceleration performance improvement, and has higher convergence performance and task success rate.

Accelerating DNN Inference with Reliability Guarantee in Vehicular Edge Computing

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Toward Reliable DNN-Based Task Partitioning and Offloading in Vehicular Edge Computing

Adaptive Task Offloading in Vehicular Edge Computing Networks: a Reinforcement Learning Based Scheme

DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach

Computation Offloading with Reliability Guarantee in Vehicular Edge Computing Systems

Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing

Adaptive Task Offloading in Vehicular Edge Computing Networks Based on Deep Reinforcement Learning

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

DNN Partition and Offloading Strategy with Improved Particle Swarm Genetic Algorithm in VEC

Reliable adaptive edge-cloud collaborative DNN inference acceleration scheme combining computing and communication resources in optical networks

Distributed Assignment With Load Balancing for DNN Inference at the Edge

Optimizing Task Offloading and Resource Allocation in Vehicular Edge Computing Based on Heterogeneous Cellular Networks

Neural Network-Based Game Theory for Scalable Offloading in Vehicular Edge Computing: A Transfer Learning Approach

A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

On-demand Edge Inference Scheduling with Accuracy and Deadline Guarantee.

Cooperative Computational Offloading in Mobile Edge Computing for Vehicles: A Model-Based DNN Approach

Digital twin-assisted resource allocation framework based on edge collaboration for vehicular edge computing

Multi-User Computation Offloading and Resource Allocation Algorithm in a Vehicular Edge Network