Abstract:The emergence of Mobile Edge Computing (MEC) alleviates the large transmission latency resulting from the traditional cloud computing. For the compute-intensive requests such as video analysis, mobile users prefer to obtain a desired quality of experience (QoE) with neglected latency and reduced energy consumption. The popularity of smart devices allows users to release a run of compute-intensive as well as latency-sensitive requests anywhere, which may lead to bursty requests. A single resource-constrained edge server nearby is capable of handling a small amount of requests quickly, yet it seems helpless when encountering bursty compute-intensive requests. Despite the abundance of recently proposed schemes, the majority focus on efficiently scheduling pending requests in a single edge server, and ignored the potential role of edge collaboration to schedule bursty requests. Besides, while some recent studies proposed to finish a task using multiple devices, they focused on collaboration between mobile devices rather than between edge servers. Hence, we propose DeepLoad, a S2S system that schedules the bursty requests with a collaborative method using reinforcement learning (RL). DeepLoad decouples the scheduling decision into AP selection for setting the access point and workload redistribution for collaborative servers. DeepLoad trains a neural network model that picks decisions for each request based on observations collected by mobile devices. DeepLoad learns to make scheduling decisions solely through the resulting performance of historical decisions rather than rely on pre-programmed models or specific assumptions for the environment. Naturally, DeepLoad automatically learns the scheduling algorithm for each request and obtains a gratifying QoE. We aim to maximize the fraction of requests finished before their attached deadlines. Based on the Shanghai taxi trajectory data set, we design a simulator to obtain abundant samples, and leverage two GeForce GTX TITAN Xp GPUs to train the Actor–Critic network. Compared to the state-of-the-art bandwidth-based and server resources-based methods, DeepLoad can achieve a significant improvement in average fraction.

ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG

SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based Platforms

LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster

DeepBoot: Dynamic Scheduling System for Training and Inference Deep Learning Tasks in GPU Cluster

Elastic Deep Learning in Multi-Tenant GPU Clusters

Liquid: Intelligent Resource Estimation and Network-Efficient Scheduling for Deep Learning Jobs on Distributed GPU Clusters

Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

Energy-Efficient GPU Clusters Scheduling for Deep Learning

GPU Cluster Scheduling for Network-Sensitive Deep Learning

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

Dynamic Space-Time Scheduling for GPU Inference

SCHED²: Scheduling Deep Learning Training Via Deep Reinforcement Learning.

BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing

CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU–GPU system

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Effective Elastic Scaling of Deep Learning Workloads

Multi-user Co-inference with Batch Processing Capable Edge Server

Learning Scheduling Bursty Requests in Mobile Edge Computing Using DeepLoad

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs