Deterministic reinforcement learning for optimized formation control of virtually-coupled trains via performance index monitor

Shigen Gao,Chaoan Xu,Hang Zhang,Ning Zhao,Tuo Shen,Hairong Dong
DOI: https://doi.org/10.1016/j.eswa.2023.121421
IF: 8.5
2023-09-08
Expert Systems with Applications
Abstract:This article presents a deterministic reinforcement learning based optimized formation control for virtually-coupled trains using a novel designed performance index monitor, forming a new paradigm of artificial intelligence-based control for virtually-coupled trains. First, a reinforcement learning-based optimized formation control is designed using backstepping design and neural networks-based actor-critic framework, achieving steady formation separation among multiple trains and providing key theoretical basis to "moving block with soft wall" (or virtually-coupled) mode, by online approximating the solutions of Hamilton–Jacobi-Bellman equations in both virtual position adjusting and actual speed regulating control signals. Second, a hysteretic-lag performance index monitor is designed to determine the time moments that enough knowledge has been obtained and insufficient knowledge is available in dynamical environments and switch the previously-mentioned reinforcement learning-based optimized formation control to deterministic reinforcement learning manner and vice versa. The idea of deterministic reinforcement learning lies in that as long as the convergences of actor and critic neural networks are observed, speaking equivalently, sufficient knowledge has been obtained provisionally, inner weight vectors of neural networks can be set as deterministic constant ones, achieving the human-like learning and control and reducing the computational amount. Rigorous closed-loop stability using deterministic reinforcement learning is also given using Lyapunov stability theorem. Finally, by utilizing the designed performance index monitor, the switching behaviour between reinforcement learning and deterministic reinforcement learning is guaranteed to be finite during finite time of running duration. Simulation examples are given to verify the effectiveness of proposed control.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?