Offline Constrained Reinforcement Learning for Batch-to-batch Optimization of Cobalt Oxalate Synthesis Process

Runda Jia,Mingchuan Zhang,Jun Zheng,Dakuo He,Fei Chu,Kang Li
DOI: https://doi.org/10.1016/j.cherd.2024.08.013
2024-01-01
Abstract:The cobalt oxalate synthesis, a batch process, plays a crucial role in the refinement of cobalt metal. The mean particle size of cobalt oxalate is a critical indicator that reflects product quality. However, excessive ammonium oxalate solution flow can heighten waste disposal costs in the production process. To address these issues, we propose a novel offline reinforcement learning (RL) algorithm that guarantees compliance with constraints in the cobalt oxalate synthesis process, utilizing exclusively static datasets. This method employs cost critic networks to assess costs, transforming the constrained optimization problem into an unconstrained one by introducing Lagrangian multipliers. We use exponential moving average (EMA) to optimize the update of proportional integral derivative (PID) control multipliers, reduce overshoot and oscillation in the control process, and thus improve the overall stability of the system. Furthermore, to optimize algorithm performance, a deep residual network (DResNet) is integrated into the policy network. Experimental results indicate that the algorithm’s optimization policy performs significantly better under constraints.
What problem does this paper attempt to address?