Closed‐loop stability analysis of deep reinforcement learning controlled systems with experimental validation

Mohammed Basheer Mohiuddin,Igor Boiko,Rana Azzam,Yahya Zweiri
DOI: https://doi.org/10.1049/cth2.12712
IF: 2.67
2024-06-28
IET Control Theory and Applications
Abstract:This research investigates the closed‐loop stability of dynamic systems controlled by deep reinforcement learning agents through Lyapunov analysis and a linear‐quadratic polynomial approximation of the trained agent. The study validates its approach with simulations and experiments on real‐world hardware, confirming the deep reinforcement learning's effectiveness and identifying critical operational thresholds and stability margins for practical applications. Trained deep reinforcement learning (DRL) based controllers can effectively control dynamic systems where classical controllers can be ineffective and difficult to tune. However, the lack of closed‐loop stability guarantees of systems controlled by trained DRL agents hinders their adoption in practical applications. This research study investigates the closed‐loop stability of dynamic systems controlled by trained DRL agents using Lyapunov analysis based on a linear‐quadratic polynomial approximation of the trained agent. In addition, this work develops an understanding of the system's stability margin to determine operational boundaries and critical thresholds of the system's physical parameters for effective operation. The proposed analysis is verified on a DRL‐controlled system for several simulated and experimental scenarios. The DRL agent is trained using a detailed dynamic model of a non‐linear system and then tested on the corresponding real‐world hardware platform without any fine‐tuning. Experiments are conducted on a wide range of system states and physical parameters and the results have confirmed the validity of the proposed stability analysis (https://youtu.be/QlpeD5sTlPU).
automation & control systems,engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?