Abstract:To achieve a high degree of resource utilization, production clusters need to co-schedule diverse workloads – including both batch analytic jobs with short-lived tasks and long-running applications (LRAs) that execute for a long time frame from hours to months – onto the shared resources. Microservice architecture advances the manifestation of distributed LRAs (DLRAs), comprising multiple interconnected microservices that are executed in long-lived distributed containers and serve massive user requests. Detecting and mitigating QoS violation become even more intractable due to the network uncertainties and latency propagation across dependent microservices. However, current resource managers are only responsible for resource allocation among applications/jobs but agnostic to runtime QoS such as latency at application level. The state-of-the-art QoS-aware scheduling approaches are dedicated for monolithic applications, without considering the temporal-spatio performance variability across distributed microservices. In this paper, we present Toposch , a new scheduling and execution framework to prioritize the QoS of DLRAs whilst balancing the performance of batch jobs and maintaining high cluster utilization through harvesting idle resources. Toposch tracks footprints of every single request across microservices and uses critical path analysis, based on the end-to-end latency graph, to identify microservices that have high risk of QoS violation. Based on microservice and node level risk assessment, we intervene the batch scheduling by adaptively reducing the visible resources to batch tasks and thus delaying their execution to give way to DLRAs. We propose a prediction-based vertical resource auto-scaling mechanism, with the aid of resource-performance modeling and fine-grained resource inference and access control, for prompt recovery of QoS violation. A cost-effective task preemption is leveraged to ensure a low-cost task preemption and resource reclamation during the auto-scaling. Toposch is integrated with Apache YARN and experiments show that Toposch outperforms other baselines in terms of performance guarantee of DLRAs, at an acceptable cost of batch job slowdown. The tail latency of DLRAs is merely 1.12x of the case of executing alone on average in Toposch with a 26% JCT increase of Spark analytic jobs.

QoS-Aware Co-Scheduling for Distributed Long-Running Applications on Shared Clusters.

Topology-Aware Scheduling Framework for Microservice Applications in Cloud

Zeus: Improving Resource Efficiency Via Workload Colocation for Massive Kubernetes Clusters

Dynamically Adjusting Scale of a Kubernetes Cluster under QoS Guarantee

Preemptive and Low Latency Datacenter Scheduling via Lightweight Containers

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

A HPC Co-Scheduler with Reinforcement Learning

An Integrated Dynamic Resource Scheduling Framework in On-Demand Clouds.

Adaptive QoS-aware Microservice Deployment with Excessive Loads Via Intra- and Inter-Datacenter Scheduling

GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments

A QoS-oriented Scheduling and Autoscaling Framework for Deep Learning

A priority-aware scheduling framework for heterogeneous workloads in container-based cloud

Optimizing Cloud Performance: A Microservice Scheduling Strategy for Enhanced Fault-Tolerance, Reduced Network Traffic, and Lower Latency

PBScaler: A Bottleneck-aware Autoscaling Framework for Microservice-based Applications

A Hierarchical Receding Horizon Algorithm for QoS-Driven Control of Multi-IaaS Applications

DRL-Scheduling: an Intelligent QoS-Aware Job Scheduling Framework for Applications in Clouds

QoS-awareness of Microservices with Excessive Loads Via Inter-Datacenter Scheduling

DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system

Microservice Workflow Modeling for Affinity Scheduling to Improve the QoS

Daphne: A Flexible and Hybrid Scheduling Framework in Multi-Tenant Clusters

Topology-aware Preemptive Scheduling for Co-located LLM Workloads