Abstract:Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the **Unrelated Parallel Machine Scheduling (UPMS)**, especially in the presence of setup time and resource limitations. Specifically, the research aims to optimize this complex scheduling problem through the Multi - Agent Reinforcement Learning (MARL) method. ### Problem Background Scheduling problems are of great significance in resource management, industrial production, and operation management. In particular, production scheduling involves the effective and economic allocation of limited resources to support the production process. However, these scheduling problems usually involve complex combinatorial optimization and often present as NP - hard problems in the actual industrial environment. Traditional single - agent algorithms face challenges when dealing with large - scale and complex scenarios, so new methods need to be explored to deal with these problems. ### Research Objectives The main objectives of this paper are: 1. **Introduce the MARL environment**: Build a multi - agent reinforcement learning environment for the UPMS problem. 2. **Compare different algorithms**: Compare the performance of MARL and single - agent algorithms through experiments and evaluate their effects in different scenarios. 3. **Explore deep neural network strategies**: Use different deep neural network strategies to improve the performance of single - agent and multi - agent. 4. **Propose effective solutions**: Verify the effectiveness of the Maskable - extended Proximal Policy Optimization (PPO) algorithm in single - agent scenarios and the performance of the multi - agent PPO algorithm in multi - agent scenarios. ### Specific Problem Description The Unrelated Parallel Machine Scheduling (UPMS) problem can be formalized as follows: - **Machine set**: Suppose there is a set of M unrelated parallel machines \( M=\{m_i|i\in\{1, 2,\ldots,M\}\} \). - **Task set**: Suppose there is a set of J tasks \( J = \{j_i|i\in\{1, 2,\ldots,J\}\} \). - **Processing time**: The processing time \( pt_{jm} \) of task \( j \) on machine \( m \). - **Setup time**: The setup time \( st_{j_ij_km} \) from task \( j_i \) to task \( j_k \) on machine \( m \). - **Worker set**: Suppose there is a set of W workers \( W=\{w_i|i\in\{1, 2,\ldots,W\}\} \). - **Worker - machine compatibility**: The binary variable \( ow_m \) indicates whether worker \( w \) can operate machine \( m \), where \( ow_m = 1 \) indicates compatibility, and 0 otherwise. - **Required number of workers**: The number of workers \( r_{jm} \) required for task \( j \) to be executed on machine \( m \). The optimization objective function is: \[ \text{Min } f(x)=w_1\cdot T(x)+w_2\cdot U(x)-w_3\cdot P(x) \] where: - \( T(x) \) represents the total task completion time, taking into account the task processing time and setup time. - \( U(x) \) represents the resource utilization rate. - \( P(x) \) represents the number of tasks executed. - The weights \( w_1, w_2, w_3 \) are used to adjust the importance of each objective. ### Main Challenges 1. **Complexity**: The UPMS problem is NP - hard, with a huge search space, and it is difficult to find the global optimal solution. 2. **Multi - objective optimization**: It is necessary to simultaneously minimize the total completion time, setup time, and optimize resource utilization. 3. **Multi - agent coordination**: The cooperation and coordination among agents in a multi - agent system is a challenge, especially in a dynamic environment. By introducing the MARL method, this research aims to overcome these challenges and provide a more intelligent and efficient scheduling solution.

Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling

Multi-Agent Reinforcement Learning for Job Shop Scheduling in Dynamic Environments

Solving job scheduling problems in a resource preemption environment with multi-agent reinforcement learning

Reinforcement Learning Approach for Multi-Agent Flexible Scheduling Problems

Large-scale Machine Learning Cluster Scheduling via Multi-agent Graph Reinforcement Learning

Knowledge graph-enhanced multi-agent reinforcement learning for adaptive scheduling in smart manufacturing

Multi-Agent Reinforcement Learning for Real-Time Dynamic Production Scheduling in a Robot Assembly Cell

Using Multi-Agent Deep Reinforcement Learning For Flexible Job Shop Scheduling Problems

A two-stage RNN-based deep reinforcement learning approach for solving the parallel machine scheduling problem with due dates and family setups

Multi-Task Multi-Agent Reinforcement Learning for Real-Time Scheduling of a Dual-Resource Flexible Job Shop with Robots

Scalable Multi-Agent Reinforcement Learning for Residential Load Scheduling under Data Governance

Multi-Agent Reinforcement Learning for Extended Flexible Job Shop Scheduling

A review of research on reinforcement learning algorithms for multi-agents

Towards Efficient Multi-Agent Learning Systems

Real-Time Multi-Vehicle Scheduling in Tasks With Dependency Relationships Using Multi-Agent Reinforcement Learning

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Efficient Multi-agent Reinforcement Learning by Planning

Multi-agent Reinforcement Learning for Dynamic Dispatching in Material Handling Systems

Scalability Bottlenecks in Multi-Agent Reinforcement Learning Systems