Efficient off‐policy Q‐learning for multi‐agent systems by solving dual games
Yan Wang,Huiwen Xue,Jiwei Wen,Jinfeng Liu,Xiaoli Luan
DOI: https://doi.org/10.1002/rnc.7189
IF: 3.8973
2024-01-10
International Journal of Robust and Nonlinear Control
Abstract:This article develops distributed optimal control policies via Q‐learning for multi‐agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi‐player non‐zero‐sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti‐disturbance problem is formulated as a two‐player zero‐sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data‐driven off‐policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed l2 ‐bounded synchronization error. (2) An actor‐critic‐disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.
automation & control systems,engineering, electrical & electronic,mathematics, applied