Optimal Tracking Control of Nonlinear Batch Processes with Unknown Dynamics Using Two-Dimensional Off-Policy Interleaved Q-learning Algorithm

Huiyuan Shi,Wei Gao,Xueying Jiang,Chengli Su,Ping Li
DOI: https://doi.org/10.1080/00207179.2023.2267701
IF: 2.102
2024-01-01
International Journal of Control
Abstract:A novel two-dimensional (2D) off-policy interleaved Q-learning algorithm is proposed to handle the optimal tracking control problem without prior knowledge of nonlinear batch processes and an initial control policy, which overcomes the drawback that the system dynamic parameters change intermittently and the difficulty of obtaining the initial parameters, greatly reducing the computational difficulty of the optimal policy. Consequently, three-layer neural networks, including the model network, the critic network and the action network are designed as the approximate parameter structure to search for a control policy via the 2D off-policy interleaved Q-learning algorithm. The weights in each layer of the neural network are continuously learned and renewed by historical data in both time and batch directions in order to obtain the optimal control policy. After that, the convergence and optimality are scrupulously verified. Ultimately, the simulation results of the injection stage confirmed the validity and feasibility of the proposed algorithm.
What problem does this paper attempt to address?