Multi-agent Reinforcement Learning for Dynamic Dispatching in Material Handling Systems

Xian Yeow Lee,Haiyan Wang,Daisuke Katsumata,Takaharu Matsui,Chetan Gupta
2024-09-27
Abstract:This paper proposes a multi-agent reinforcement learning (MARL) approach to learn dynamic dispatching strategies, which is crucial for optimizing throughput in material handling systems across diverse industries. To benchmark our method, we developed a material handling environment that reflects the complexities of an actual system, such as various activities at different locations, physical constraints, and inherent uncertainties. To enhance exploration during learning, we propose a method to integrate domain knowledge in the form of existing dynamic dispatching heuristics. Our experimental results show that our method can outperform heuristics by up to 7.4 percent in terms of median throughput. Additionally, we analyze the effect of different architectures on MARL performance when training multiple agents with different functions. We also demonstrate that the MARL agents performance can be further improved by using the first iteration of MARL agents as heuristics to train a second iteration of MARL agents. This work demonstrates the potential of applying MARL to learn effective dynamic dispatching strategies that may be deployed in real-world systems to improve business outcomes.
Machine Learning,Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
The paper attempts to address the problem of optimizing throughput in material handling systems through dynamic scheduling. Specifically, the authors propose a Multi-Agent Reinforcement Learning (MARL) approach to learn dynamic scheduling policies, overcoming the limitations of traditional heuristic scheduling rules in complex material handling systems. These limitations include inherent system uncertainties, complex interactions between subprocesses, and system changes due to business expansion or contraction. To validate their proposed method, the authors developed a material handling environment that simulates a real-world system, reflecting various activities at different locations, physical constraints, and inherent uncertainties. Additionally, the authors proposed a method to integrate existing dynamic scheduling heuristic knowledge into the learning process to enhance exploration. Experimental results show that the proposed method can improve median throughput by up to 7.4% compared to heuristic methods. Furthermore, the authors analyzed the impact of different architectures on MARL performance and demonstrated that using first-generation MARL agents as heuristics to train second-generation MARL agents can further enhance performance. This work demonstrates the potential of applying MARL to learn effective dynamic scheduling policies that can be deployed in real-world systems to improve business outcomes.