A Novel Soft Actor–Critic Framework with Disjunctive Graph Embedding and Autoencoder Mechanism for Job Shop Scheduling Problems

Wenquan Zhang,Fei Zhao,Chuntao Yang,Chao Du,Xiaobing Feng,Yukun Zhang,Zhaoxian Peng,Xuesong Mei
DOI: https://doi.org/10.1016/j.jmsy.2024.08.015
IF: 12.1
2024-01-01
Journal of Manufacturing Systems
Abstract:The Job-Shop Scheduling Problem (JSSP) is a well-established and classic NP-hard combinatorial optimization issue. The quality of its scheduling scheme directly affects the operational efficiency of manufacturing systems. Priority Dispatching Rules (PDRs) are often utilized to address JSSP in real-world contexts, but the process of creating effective PDRs can be daunting and time-consuming. It also necessitates comprehensive domain knowledge, typically resulting in mediocre performance. In this paper, we introduce a novel reinforcement learning (RL) model called Disjunctive Graph Embedding with Autoencoder Mechanism for Job Shop Scheduling Problems (DGEAM-JSSP), designed to automate PDRs learning. Our proposed model confronts the issue using a Graph Neural Network (GNN) to learn node features that encapsulate the spatial structure of the JSSP graph representation. The ensuing policy network is size-agnostic, enabling effective generalization on larger-scale instances. Additionally, we employ a transformer encoder, incorporating parallel encoding and a self-attention mechanism, to successfully recognize long-term dependencies among operations in large-scale scheduling problems. We also implemented an end-to-end training approach using the Soft Actor-Critic (SAC) algorithm to instruct the two modules. Computational experiment results reveal that, with a single training, our agent successfully learns a superior dispatching policy, surpassing PDRs and state-of-the-art RL frameworks specifically tailored for each JSSP instance size in solution quality, as well as OR-Tools in execution speed. Moreover, results from random and benchmark instances illustrate that the uniquely-modeled learned policies have impressive generalization performance on real-world instances and significantly larger-scale scenarios involving up to 2000 operations.
What problem does this paper attempt to address?