Abstract:Cloud computing is undeniably becoming the main computing and storage platform for today's major workloads. From Internet of things and Industry 4.0 workloads to big data analytics and decision‐making jobs, cloud systems daily receive a massive number of tasks that need to be simultaneously and efficiently mapped onto the cloud resources. Therefore, deriving an appropriate task scheduling mechanism that can both minimize tasks' execution delay and cloud resources utilization is of prime importance. Recently, the concept of cloud automation has emerged to reduce the manual intervention and improve the resource management in large‐scale cloud computing workloads. In this article, we capitalize on this concept and propose four deep and reinforcement learning‐based scheduling approaches to automate the process of scheduling large‐scale workloads onto cloud computing resources, while reducing both the resource consumption and task waiting time. These approaches are: reinforcement learning (RL), deep Q networks, recurrent neural network long short‐term memory (RNN‐LSTM), and deep reinforcement learning combined with LSTM (DRL‐LSTM). Experiments conducted using real‐world datasets from Google Cloud Platform revealed that DRL‐LSTM outperforms the other three approaches. The experiments also showed that DRL‐LSTM minimizes the CPU usage cost up to 67% compared with the shortest job first (SJF), and up to 35% compared with both the round robin (RR) and improved particle swarm optimization (PSO) approaches. Moreover, our DRL‐LSTM solution decreases the RAM memory usage cost up to 72% compared with the SJF, up to 65% compared with the RR, and up to 31.25% compared with the improved PSO.

Octopus: an End-to-end Multi-DAG Scheduling Method Based on Deep Reinforcement Learning

End-to-end Multi-Target Flexible Job Shop Scheduling with Deep Reinforcement Learning

Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning.

Learning to Optimize DAG Scheduling in Heterogeneous Environment

Reinforcement Learning Based Online Scheduling of Multiple Workflows in Edge Environment

DGCQN: a RL and GCN combined method for DAG scheduling in edge computing

Learning to Schedule DAG Tasks

A Scheduling Algorithm Based on Reinforcement Learning for Heterogeneous Environments.

Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization

Telemetry-aided cooperative multi-agent online reinforcement learning for DAG task scheduling in computing power networks

Deep and reinforcement learning for automated task scheduling in large‐scale cloud computing systems

A Dual-Agent Scheduler for Distributed Deep Learning Jobs on Public Cloud Via Reinforcement Learning

Large-scale Machine Learning Cluster Scheduling via Multi-agent Graph Reinforcement Learning

Multi-Agent Reinforcement Learning for Job Shop Scheduling in Dynamic Environments

Dynamic scheduling of tasks in cloud manufacturing with multi-agent reinforcement learning

Deep learning and optimization enabled multi-objective for task scheduling in cloud computing

A learning and evolution-based intelligence algorithm for multi-objective heterogeneous cloud scheduling optimization

A Novel Multi-Agent Reinforcement Learning Approach for Job Scheduling in Grid Computing

Weighted Double Deep Q-network Based Reinforcement Learning for Bi-Objective Multi-Workflow Scheduling in the Cloud

Ets-ddpg: an Energy-Efficient and QoS-guaranteed Edge Task Scheduling Approach Based on Deep Reinforcement Learning