Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling

Maria Zampella,Urtzi Otamendi,Xabier Belaunzaran,Arkaitz Artetxe,Igor G. Olaizola,Giuseppe Longo,Basilio Sierra
2024-11-12
Abstract:Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.
Artificial Intelligence,Machine Learning,Multiagent Systems,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the **Unrelated Parallel Machine Scheduling (UPMS)**, especially in the presence of setup time and resource limitations. Specifically, the research aims to optimize this complex scheduling problem through the Multi - Agent Reinforcement Learning (MARL) method. ### Problem Background Scheduling problems are of great significance in resource management, industrial production, and operation management. In particular, production scheduling involves the effective and economic allocation of limited resources to support the production process. However, these scheduling problems usually involve complex combinatorial optimization and often present as NP - hard problems in the actual industrial environment. Traditional single - agent algorithms face challenges when dealing with large - scale and complex scenarios, so new methods need to be explored to deal with these problems. ### Research Objectives The main objectives of this paper are: 1. **Introduce the MARL environment**: Build a multi - agent reinforcement learning environment for the UPMS problem. 2. **Compare different algorithms**: Compare the performance of MARL and single - agent algorithms through experiments and evaluate their effects in different scenarios. 3. **Explore deep neural network strategies**: Use different deep neural network strategies to improve the performance of single - agent and multi - agent. 4. **Propose effective solutions**: Verify the effectiveness of the Maskable - extended Proximal Policy Optimization (PPO) algorithm in single - agent scenarios and the performance of the multi - agent PPO algorithm in multi - agent scenarios. ### Specific Problem Description The Unrelated Parallel Machine Scheduling (UPMS) problem can be formalized as follows: - **Machine set**: Suppose there is a set of M unrelated parallel machines \( M=\{m_i|i\in\{1, 2,\ldots,M\}\} \). - **Task set**: Suppose there is a set of J tasks \( J = \{j_i|i\in\{1, 2,\ldots,J\}\} \). - **Processing time**: The processing time \( pt_{jm} \) of task \( j \) on machine \( m \). - **Setup time**: The setup time \( st_{j_ij_km} \) from task \( j_i \) to task \( j_k \) on machine \( m \). - **Worker set**: Suppose there is a set of W workers \( W=\{w_i|i\in\{1, 2,\ldots,W\}\} \). - **Worker - machine compatibility**: The binary variable \( ow_m \) indicates whether worker \( w \) can operate machine \( m \), where \( ow_m = 1 \) indicates compatibility, and 0 otherwise. - **Required number of workers**: The number of workers \( r_{jm} \) required for task \( j \) to be executed on machine \( m \). The optimization objective function is: \[ \text{Min } f(x)=w_1\cdot T(x)+w_2\cdot U(x)-w_3\cdot P(x) \] where: - \( T(x) \) represents the total task completion time, taking into account the task processing time and setup time. - \( U(x) \) represents the resource utilization rate. - \( P(x) \) represents the number of tasks executed. - The weights \( w_1, w_2, w_3 \) are used to adjust the importance of each objective. ### Main Challenges 1. **Complexity**: The UPMS problem is NP - hard, with a huge search space, and it is difficult to find the global optimal solution. 2. **Multi - objective optimization**: It is necessary to simultaneously minimize the total completion time, setup time, and optimize resource utilization. 3. **Multi - agent coordination**: The cooperation and coordination among agents in a multi - agent system is a challenge, especially in a dynamic environment. By introducing the MARL method, this research aims to overcome these challenges and provide a more intelligent and efficient scheduling solution.