Abstract:With the development of smart Internet of Things (IoT), it has seen a surge in wireless devices deploying Deep Neural Network (DNN) models for real-time computing tasks. However, the inherent resource and energy constraints of wireless devices make local completion of real-time inference tasks impractical. DNN model partitioning can partition the DNN model and use edge servers to assist in completing DNN model inference tasks, but offloading also requires a lot of transmission energy consumption. Additionally, the complex structure of DNN models means partitioning and offloading across different network layers impacts overall energy consumption significantly, complicating the development of an optimal partitioning strategy. Furthermore, in certain application contexts, regular battery charging or replacement for smart IoT devices is impractical and environmentally harmful. The development of wireless energy transfer technology enables devices to obtain RF energy through wireless transmission to achieve sustainable power supply. Motivated by this, We proposes a problem of joint DNN model partition and resource allocation in Wireless Powered Edge Computing (WPMEC). However, time-varying channel state in the WPMEC have a significant impact on resource allocation decisions. How to jointly optimize DNN model partition and resource allocation decisions is also a significant challenge. We proposes an online algorithm based on Deep Reinforcement Learning (DRL) to solve the time allocation decision, simplifying a Mixed Integer Nonlinear Problem (MINLP) into a convex optimization problem. Our approach seeks to maximize the completion rate of DNN inference tasks within the constraints of time-varying wireless channel states and delay constraints. Simulation results show the exceptional performance of this algorithm in enhancing task completion rates.

Distributed DNN Inference with Fine-grained Model Partitioning in Mobile Edge Computing Networks

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Delay-Aware DNN Inference Throughput Maximization in Edge Computing Via Jointly Exploring Partitioning and Parallelism

Deep Neural Network Task Partitioning and Offloading for Mobile Edge Computing

DNN Real-Time Collaborative Inference Acceleration with Mobile Edge Computing

Joint multi-user DNN partitioning and task offloading in mobile edge computing

DNN Inference Task Offloading Based on Distributed Soft Actor-Critic in Mobile Edge Computing.

Throughput Maximization of Delay-Aware DNN Inference in Edge Computing by Exploring DNN Model Partitioning and Inference Parallelism

Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

Task Partitioning and Offloading in DNN-Task Enabled Mobile Edge Computing Networks

Accelerating DNN Inference by Edge-Cloud Collaboration

Collaborative DNNs Inference with Joint Model Partition and Compression in Mobile Edge-Cloud Computing Networks

MoEI: Mobility-Aware Edge Inference Based on Model Partition and Service Migration

Joint DNN partitioning and task offloading in mobile edge computing via deep reinforcement learning

Joint DNN Partition Deployment and Resource Allocation for Delay-Sensitive Deep Learning Inference in IoT

An Adaptive Task Migration Scheduling Approach for Edge-Cloud Collaborative Inference

Hastening Stream Offloading of Inference Via Multi-Exit DNNs in Mobile Edge Computing

Joint Optimization of DNN Partition and Continuous Task Scheduling for Digital Twin-Aided MEC Network With Deep Reinforcement Learning

Joint DNN partitioning and resource allocation for completion rate maximization of delay-aware DNN inference tasks in wireless powered mobile edge computing

End-to-End Delay Minimization based on Joint Optimization of DNN Partitioning and Resource Allocation for Cooperative Edge Inference