Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop Scheduling

Cong Zhang,Zhiguang Cao,Wen Song,Yaoxin Wu,Jie Zhang
2024-02-14
Abstract:Recent studies in using deep reinforcement learning (DRL) to solve Job-shop scheduling problems (JSSP) focus on construction heuristics. However, their performance is still far from optimality, mainly because the underlying graph representation scheme is unsuitable for modelling partial solutions at each construction step. This paper proposes a novel DRL-guided improvement heuristic for solving JSSP, where graph representation is employed to encode complete solutions. We design a Graph Neural-Network-based representation scheme, consisting of two modules to effectively capture the information of dynamic topology and different types of nodes in graphs encountered during the improvement process. To speed up solution evaluation during improvement, we present a novel message-passing mechanism that can evaluate multiple solutions simultaneously. We prove that the computational complexity of our method scales linearly with problem size. Experiments on classic benchmarks show that the improvement policy learned by our method outperforms state-of-the-art DRL-based methods by a large margin.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper attempts to solve the Job - shop Scheduling Problem (JSSP). Specifically, the existing methods based on Deep Reinforcement Learning (DRL) mainly focus on constructing heuristic algorithms when solving JSSP, but the performance of these methods is still far from optimal. The main reason is that the underlying graph representation scheme is not suitable for modeling partial solutions in each construction step. Therefore, this paper proposes a new DRL - guided improved heuristic algorithm, which encodes the complete solution through graph representation, and designs a representation scheme based on Graph Neural Network (GNN) to effectively capture the dynamic topology of the graph and different types of node information encountered in the improvement process. In addition, in order to accelerate the solution evaluation speed in the improvement process, this paper also proposes a new message - passing mechanism that can evaluate multiple solutions simultaneously. The experimental results show that this method significantly outperforms the existing DRL methods in the classical benchmark tests, and the computational complexity has a linear relationship with the problem scale.