Abstract:Presently, the cloud computing environment attracts many application developers to deploy their web applications on cloud data centers. Kubernetes, a well-known container orchestration for deploying web applications on cloud systems, offers an automatic scaling feature to meet clients’ ever-changing demands with the reactive approach. This paper proposes a system architecture based on Kubernetes with a proactive custom autoscaler using a deep neural network model to handle the workload during run time dynamically. The proposed system architecture is designed based on the Monitor–Analyze–Plan–Execute (MAPE) loop. The main contribution of this paper is the proactive custom autoscaler, which focuses on the analysis and planning phases. In analysis phase, Bidirectional Long Short-term Memory (Bi-LSTM) is applied to predict the number of HTTP workloads in the future. In the planning phase, a cooling-down time period is implemented to mitigate the oscillation problem. In addition, a resource removal strategy is proposed to remove a part of the resources when the workload decreases, so that the autoscaler can handle it faster when the burst of workload happens. Through experiments with two different realistic workloads, the Bi-LSTM model achieves better accuracy not only than the Long Short-Term Memory model but also than the state-of-the-art statistical auto-regression integrated moving average model in terms of short- and long-term forecasting. Moreover, it offers 530 to 600 times faster prediction speed than ARIMA models with different workloads. Furthermore, as compared to the LSTM model, the Bi-LSTM model performs better in terms of resource provision accuracy and elastic speedup. Finally, it is shown that the proposed proactive custom autoscaler outperforms the default horizontal pod autoscaler (HPA) of the Kubernetes in terms of accuracy and speed when provisioning and de-provisioning resources.

A QoS-oriented Scheduling and Autoscaling Framework for Deep Learning

Dynamically Adjusting Scale of a Kubernetes Cluster under QoS Guarantee

An Improved Kubernetes Scheduling Algorithm for Deep Learning Platform

Scheduling Distributed Deep Learning Jobs in Heterogeneous Cluster with Placement Awareness

Speculative Container Scheduling for Deep Learning Applications in a Kubernetes Cluster

Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization

DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

UniSched: A Unified Scheduler for Deep Learning Training Jobs with Different User Demands

A multi-objective trade-off framework for cloud resource scheduling based on the Deep Q-network algorithm

Differentiate Quality of Experience Scheduling for Deep Learning Inferences with Docker Containers in the Cloud

An Optimal Network-Aware Scheduling Technique for Distributed Deep Learning in Distributed HPC Platforms

Deep Learning-Based Autoscaling Using Bidirectional Long Short-Term Memory for Kubernetes

KubFBS: A fine‐grained and balance‐aware scheduling system for deep learning tasks based on kubernetes

On a Meta Learning-based Scheduler for Deep Learning Clusters

Optimization of Task-Scheduling Strategy in Edge Kubernetes Clusters Based on Deep Reinforcement Learning

An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing