Abstract:As deep neural networks (DNNs) are increasingly used in a broad spectrum of edge intelligent applications, it is often necessary to provide multi-DNN model inference services, and it is nontrivial for edge inference platforms to simultaneously deliver high-throughput and low-latency. Such edge devices with multi-DNN model pose new challenges for scheduler designs. First, edge devices should be capable of efficiently scheduling multiple heterogeneous DNN models in order to optimize system utilization. Second, each inference request may have different service level objectives (SLOs) to improve quality of service (QoS). To address these challenges, this paper proposes BCEdge, a novel learning-based scheduling framework that incorporates adaptive batching and concurrent execution of DNN inference services on edge devices. We first propose a shared memory policy to reduce the memory contention among multiple DNN models. Afterwards, a utility function is defined to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages branch-based deep reinforcement learning (DRL) to maximize utility by 1) optimizing batch size, 2) automatically identifying the number of concurrent instances for multiple DNN models, and 3) determining the shared memory configuration among multiple DNN models. Besides, the lightweight DNN-based prediction model in BCEdge can achieve SLO awareness by reducing the performance interference among multiple DNN models. Our prototype implemented on various edge devices illustrates that BCEdge enhances utility by up to 37.6% and reduces memory usage by up to 38% on average, compared to state-of-the-art schemes, while maintaining the SLO violation rate within 5%.

A generic communication scheduler for distributed DNN training acceleration

US-Byte: an Efficient Communication Framework for Scheduling Unequal-Sized Tensor Blocks in Distributed Deep Learning

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Accelerating Distributed DNN Training via Transport Layer Scheduling

A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters

Prophet: Speeding Up Distributed DNN Training with Predictable Communication Scheduling.

Mercury: A Simple Transport Layer Scheduler to Accelerate Distributed DNN Training

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

DynaHB: A Communication-Avoiding Asynchronous Distributed Framework with Hybrid Batches for Dynamic GNN Training

Aries: A DNN Inference Scheduling Framework for Multi-core Accelerators

Adaptive Scheduling for Edge-Assisted DNN Serving

How Useful is Communication Scheduling for Distributed Training?

DeepScheduler: Enabling Flow-Aware Scheduling in Time-Sensitive Networking.

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU