Unsupervised reward engineering for reinforcement learning controlled manufacturing

Thomas Hirtz,He Tian,Yi Yang,Tian-Ling Ren
DOI: https://doi.org/10.1007/s10845-024-02491-3
IF: 8.3
2024-10-24
Journal of Intelligent Manufacturing
Abstract:Reward engineering is a key challenge in reinforcement learning (RL) that can significantly affect the performance and applicability of RL algorithms. In the field of manufacturing, shaping the reward function for RL algorithms can be particularly difficult due to the complex and multi-objective nature of the manufacturing process. To address these challenges, we propose unsupervised reward engineering method based on a variational autoencoder (VAE) that uses the latent representation of the product for computing the environment's reward. Our approach optimizes the underlying distribution of the fabricated product directly by leveraging the latent space distance or divergence between the manufactured and ideal products. This strategy circumvents issues commonly associated with conventional reward engineering, such as misaligned and hacked rewards. Our technique enables convenient multi-objective optimization and reward value bounding. Through a -VAE architecture, we can adjust the weight of the Kullback–Leibler divergence term, prioritizing ideal characteristics or latent distribution based on the desired outcome. Applying our approach to semiconductor manufacturing, we demonstrate its benefits, including effective multi-objective optimization, stable reward, and meaningful data representations. Our method shows promise for optimizing complex manufacturing processes with RL and can be extended to various manufacturing-related fields. It can enhance product quality and offers opportunities for cross-facility manufacturing matching.
engineering, manufacturing,computer science, artificial intelligence
What problem does this paper attempt to address?