Abstract:Cloud computing has revolutionized the provisioning of computing resources, offering scalable, flexible, and on-demand services to meet the diverse requirements of modern applications. At the heart of efficient cloud operations are job scheduling and resource management, which are critical for optimizing system performance and ensuring timely and cost-effective service delivery. However, the dynamic and heterogeneous nature of cloud environments presents significant challenges for these tasks, as workloads and resource availability can fluctuate unpredictably. Traditional approaches, including heuristic and meta-heuristic algorithms, often struggle to adapt to these real-time changes due to their reliance on static models or predefined rules. Deep Reinforcement Learning (DRL) has emerged as a promising solution to these challenges by enabling systems to learn and adapt policies based on continuous observations of the environment, facilitating intelligent and responsive decision-making. This survey provides a comprehensive review of DRL-based algorithms for job scheduling and resource management in cloud computing, analyzing their methodologies, performance metrics, and practical applications. We also highlight emerging trends and future research directions, offering valuable insights into leveraging DRL to advance both job scheduling and resource management in cloud computing.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are as follows: In the cloud computing environment, traditional job scheduling and resource management methods are difficult to cope with the challenges brought by dynamic and heterogeneous environments. Specifically: 1. **Dynamic and Heterogeneous Environments**: The workload and resource availability in the cloud computing environment will fluctuate unpredictably, which makes traditional methods based on static models or predefined rules difficult to adapt to real - time changes. 2. **Limitations of Traditional Methods**: Traditional heuristic and meta - heuristic algorithms (such as genetic algorithms, whale optimization algorithms, etc.) rely on prior knowledge and static optimization models and perform poorly when dealing with rapidly changing task arrival times and resource requirements. 3. **Optimizing System Performance and Ensuring Service Quality**: Effective job scheduling and resource management are crucial for optimizing system performance and ensuring timely and cost - effective service delivery. Therefore, a solution that can intelligently respond to and adapt to these changes is required. To solve these problems, the paper proposes using Deep Reinforcement Learning (DRL) as a promising solution. DRL learns and adapts strategies through continuous interaction with the environment, thereby achieving intelligent and responsive decision - making, which is specifically reflected in the following aspects: - **High Adaptability**: DRL can dynamically adjust strategies based on continuous observations of the environment to deal with unpredictable workload and resource changes. - **Optimizing Resource Utilization**: By learning the optimal strategy, DRL can improve resource utilization, enhance system performance, and improve Quality of Service (QoS). - **Reducing Operating Costs**: DRL helps to minimize operating costs while ensuring compliance with the requirements of Service - Level Agreements (SLAs). In conclusion, this paper aims to explore how to use DRL technology to improve job scheduling and resource management in the cloud computing environment to meet the challenges brought by dynamic and complex environments.

Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level Review

Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions

A Deep Reinforcement Learning-Based Model for Optimal Resource Allocation and Task Scheduling in Cloud Computing

A Novel Job Scheduling Model to Enhance Efficiency and Overall User Fairness of Cloud Computing Environment.

Energy efficient task scheduling based on deep reinforcement learning in cloud environment: A specialized review

Energy-aware systems for real-time job scheduling in cloud data centers: A deep reinforcement learning approach

Deep and reinforcement learning for automated task scheduling in large‐scale cloud computing systems

A2C-DRL: Dynamic Scheduling for Stochastic Edge-Cloud Environments Using A2C and Deep Reinforcement Learning

Job Scheduling in Hybrid Clouds With Privacy Constraints: A Deep Reinforcement Learning Approach

A novel deep reinforcement learning scheme for task scheduling in cloud computing

Reinforcement Learning based Workflow Scheduling in Cloud and Edge Computing Environments: A Taxonomy, Review and Future Directions

Optimized Deep Learning Schemes for Secured Resource Allocation and Task Scheduling in Cloud Computing - A Survey

Cost-aware scheduling systems for real-time workflows in cloud: An approach based on Genetic Algorithm and Deep Reinforcement Learning

A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling

Research on Cloud Computing Resources Provisioning Based on Reinforcement Learning

An Integrated Dynamic Resource Scheduling Framework in On-Demand Clouds.

H2O-Cloud: A Resource and Quality of Service-Aware Task Scheduling Framework for Warehouse-Scale Data Centers -- A Hierarchical Hybrid DRL (Deep Reinforcement Learning) based Approach

Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning

An Integrated Dynamic Resource Scheduling Framework in On-Demand Clouds

Scheduling of decentralized robot services in cloud manufacturing with deep reinforcement learning