Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning

Zhicong Zhang,Li Zheng,Michael X. Weng
DOI: https://doi.org/10.1007/s00170-006-0662-8
IF: 3.563
2006-01-01
The International Journal of Advanced Manufacturing Technology
Abstract:In this paper, we discuss a dynamic unrelated parallel machine scheduling problem with sequence-dependant setup times and machine–job qualification consideration. To apply the Q-Learning algorithm, we convert the scheduling problem into reinforcement learning problems by constructing a semi-Markov decision process (SMDP), including the definition of state representation, actions and the reward function. We use five heuristics, WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT, as actions and prove the equivalence of the reward function and the scheduling objective: minimisation of mean weighted tardiness. We carry out computational experiments to examine the performance of the Q-Learning algorithm and the heuristics. Experiment results show that Q-Learning always outperforms all heuristics remarkably. Averaged over all test problems, the Q-Learning algorithm achieved performance improvements over WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT by considerable amounts of 61.38%, 60.82%, 56.23%, 57.48% and 66.22%, respectively.
What problem does this paper attempt to address?