An Improved Deep Q-learning Algorithm for a Trade-off Between Energy Consumption and Productivity in Batch Scheduling

Xu Zheng,Zhen Chen
DOI: https://doi.org/10.1016/j.cie.2024.109925
2024-01-01
Abstract:The single-batch machine, commonly found in industrial manufacturing, can concurrently process a group of jobs in variable-speed batches, leading to fluctuating levels of both energy consumption and processing time. Identifying an optimal balance between total energy consumption and makespan presents a challenge due to the intricate nature of their relationship in real-world scenarios. This study introduces a mixed-integer programming model designed to determine optimal machine operating states for diverse workloads, meeting customer requirements while concurrently achieving energy savings for sustainability. An energy-aware batch scheduling deep Q-learning network (EBSDQN) framework has been created, encompassing sequencing policies, batching rules, and speed adjustment policies. This framework is explicitly crafted to tackle the NP-hard problem, a challenge that commonly perplexes commercial solvers like Gurobi when seeking optimal solutions within a 360-second time frame for small-scale instances. The EBSDQN design is fortified with efficient action decoding policies, refined reward assessments, exploitative neighborhood rules, and a streamlined buffer training process. This approach not only conserves training time but also adapts dynamically in accordance with the principles of the Markov Decision Process (MDP) and deep neural network. The developed algorithm exhibits impressive robustness, reaching convergence after 600 episodes of training. In separate comparisons with a commercial solver, two single-objective algorithms, and two multi-objective algorithms, our algorithm consistently demonstrates superior overall performance across the same instances. In the worst-case analysis, further examination delves into the influences of job features. In summary, this study underscores the high sensitivity of both TEC and Cmax to job processing time distribution, suggesting the prioritization of TEC over Cmax in optimization for sustainable planning and operations. Furthermore, this research has the potential to accelerate the integration of artificial intelligence into the manufacturing sector.
What problem does this paper attempt to address?