Abstract:International Journal of Software Engineering and Knowledge Engineering, Ahead of Print. In mobile edge computing environment, intelligent inference services driven by DNN are highly sensitive to latency. Recently, collaborative inference between User Devices and Edge Servers (ESs) based on Deep Neural Networks (DNN) partition has achieved success in service acceleration. However, most of the existing collaborative acceleration schemes are partitioned for a single DNN inference task, which cannot quickly make partition decisions for a set of concurrent inference tasks, and often sacrifice inference accuracy. In addition, due to the limited resources of ESs, there is resource competition among concurrent requests, which makes the partitioned tasks cannot be offloaded to ESs in time for processing. Therefore, designing an efficient offloading scheme becomes essential. The task offloading schemes based on deep reinforcement learning can solve complex decision-making problems in high-dimensional state space, but they have problems such as insufficient sample diversity and easily falling into local optimum. In this paper, a Collaborative Inference Acceleration Scheme integrating DNN Partitioning and Task Offloading (CIAS-PnO) is proposed. First, while ensuring inference accuracy, the Collaborative DNN Layer Partitioning (CDLP) algorithm is designed with the goal of optimal latency. CDLP can reduce the problem scale of concurrent inference tasks partition by pruning operation and determine the partition decisions in time. Then, the Distributed Soft Actor-Critic (SAC)-based Partition Task Offloading algorithm (DSACO) is designed. DSACO supports SAC Agents to explore samples in parallel and share learning experiences, and uses the automatic entropy adjustment mechanism to improve the exploration efficiency of Agents, so as to avoid falling into local optimum and achieve efficient offloading of partition tasks. Experimental results on DNN benchmarks show that compared with the baseline acceleration schemes, CIAS-PnO achieves more than 19.8% acceleration performance improvement, and has higher convergence performance and task success rate.

Preemptive Switch Memory Usage to Accelerate Training Jobs with Shared In-Network Aggregation

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Enabling Switch Memory Management for Distributed Training with In-Network Aggregation.

Sra: Switch Resource Aggregation For Application Offloading In Programmable Networks

Training Job Placement in Clusters with Statistical In-Network Aggregation

In-Network Aggregation with Transport Transparency for Distributed Training

AggTree: A Routing Tree with In-Network Aggregation for Distributed Training

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

ATP: In-network Aggregation for Multi-tenant Learning.

Identifying Performance Bottleneck in Shared In-Network Aggregation During Distributed Training

Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches

Accelerating distributed reinforcement learning with in-switch computing

D2T: Dynamic Dual Threshold Policy of Shared-Memory in Data Center Switches

A Deep Learning Dataloader with Shared Data Preparation

No Worker Left (Too Far) Behind: Dynamic Hybrid Synchronization for In‐Network ML Aggregation

CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

Collaborative Inference Acceleration Integrating DNN Partitioning and Task Offloading in Mobile Edge Computing

Learning Buffer Management Policies for Shared Memory Switches

Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing

Low-latency job scheduling with preemption for the development of deep learning

A Resource Pooling Switch Architecture with High Performance Scheduler