Dynamic job-shop scheduling using graph reinforcement learning with auxiliary strategy

Zhenyu Liu,Haoyang Mao,Guodong Sa,Hui Liu,Jianrong Tan
DOI: https://doi.org/10.1016/j.jmsy.2024.01.002
IF: 12.1
2024-01-21
Journal of Manufacturing Systems
Abstract:The unpredictable variety of dynamic events in manufacturing systems poses a great challenge for tackling the job-shop scheduling problem (JSP), while most prior arts fail to strike a good balance between solution efficiency and dynamic adaptation. To this end, this paper outlines a graph reinforcement learning framework for solving dynamic JSP (DJSP) with stochastic processing time and machine breakdowns. The framework depicts DJSP as a Markov decision process (MDP) and expands the disjunctive graph representation of the state. Then a mixed graph Transformer network is proposed to extract state embeddings coupled with dynamic events, which combines the merits of two attention mechanisms and a spatial pyramid pooling module to flexibly fit different scheduling configurations. Further, a promising training algorithm called Phase Proximal Policy Optimization with Rollback is advanced to learn the optimal scheduling policy, which introduces an additional auxiliary phase to train the policy and value networks alternately for higher sample efficiency. Comprehensive experiments both on static benchmarks and dynamic instances as well as an actual engineering case indicate that the proposed framework exhibits significant superiority in fidelity and generalization compared to previous work in terms of solving DJSP.
engineering, manufacturing, industrial,operations research & management science
What problem does this paper attempt to address?
The paper primarily focuses on the Dynamic Job-Shop Scheduling Problem (DJSP), particularly considering unpredictable events in manufacturing systems, such as variations in processing times and machine breakdowns. The goal of the paper is to propose a method that can effectively address these challenges to improve the efficiency and adaptability of scheduling schemes. The main contributions of the paper can be summarized as follows: 1. **Proposed an effective scheduling framework**: This framework can automatically adjust optimal strategies to adapt to different production goals and configurations without the need for rescheduling. 2. **Developed a novel graph representation module**: Named the Mixed Graph Transformer Network (MGTN), it is used to extract state embeddings from exclusive graphs of different sizes, significantly enhancing the model's generalization capability. 3. **Proposed the Phase Proximal Policy Optimization with Rollback (P3OR)**: This is a training method that improves sample utilization efficiency by sharing parameters of the policy and value functions while decoupling their respective training processes. 4. **Demonstrated superior performance**: The proposed method outperforms other traditional and advanced deep reinforcement learning algorithms in both static and dynamic benchmarks as well as real-world cases. The paper first reviews existing research on dynamic scheduling problems, including heuristic methods and learning-based methods, and points out some limitations of existing methods, such as insufficient generalization capability due to instance-level training. Then, the paper details the proposed framework, including its overall architecture, graph representation learning method (i.e., MGTN), and the improved reinforcement learning training algorithm (i.e., P3OR). Finally, experiments validate the effectiveness and superiority of the proposed method.