Reinforcement Learning with Composite Rewards for Production Scheduling in a Smart Factory.

Tong Zhou,Dunbing Tang,Haihua Zhu,Liping Wang
DOI: https://doi.org/10.1109/access.2020.3046784
IF: 3.9
2020-01-01
IEEE Access
Abstract:Rapid advances of sensing and cloud technologies transform the manufacturing system into a data-rich environment and make production scheduling increasingly complex. Traditional offline scheduling methods are limited in the ability to handle low-volume-high-mix workorders with diverse design specifications. Simulation-based methods show the promise for distributed scheduling of manufacturing jobs but are mostly implemented with historical data and empirical rules in a static manner. Recently, artificial intelligence (AI) algorithms fuel increasing interests to solve dynamic scheduling problems in the manufacturing setting. However, it's difficult to utilize high-dimensional data for production scheduling while considering multiple practical objectives for smart manufacturing (e.g., minimize the makespan, reduce production costs, balance workloads). Therefore, this paper presents a new AI scheduler with composite reward functions for data-driven dynamic scheduling of manufacturing jobs under uncertainty in a smart factory. Internet-enabled sensor networks are deployed in the smart factory to track real-time statuses of workorders, machines, and material handling systems. A novel manufacturing value network is developed to take high-dimensional data as the input and then learn the state-action values for real-time decision making. Based on reinforcement learning (RL), composite rewards help the AI scheduler learn efficiently to achieve multiple objectives for production scheduling in real time. The proposed methodology is evaluated and validated with experimental studies in a smart manufacturing setting. Experimental results show that the new AI scheduler not only improves the multi-objective performance metrics in the production scheduling problem but also effectively copes with unexpected events (e.g., urgent workorders, machine failures) in manufacturing systems.
What problem does this paper attempt to address?