Adaptive Individual Q-Learning-A Multiagent Reinforcement Learning Method for Coordination Optimization

Zhen Zhang,Dongqing Wang
DOI: https://doi.org/10.1109/TNNLS.2024.3385097
2024-04-16
Abstract:Multiagent reinforcement learning (MARL) has been extensively applied to coordination optimization for its task distribution and scalability. The goal of the MARL algorithms for coordination optimization is to learn the optimal joint strategy that maximizes the expected cumulative reward of all agents. Some cooperative MARL algorithms exhibit exciting characteristics in empirical studies. However, the majority of the convergence results are confined to repeated games. Moreover, few MARL algorithms consider adaptation to the switched environments such as the alternation between peak hours and off-peak hours of urban traffic flow or an obstacle suddenly appearing on the planned route for the automated guided vehicle. To this end, we propose a cooperative MARL algorithm known as adaptive individual Q-learning (A-IQL). Each agent updates the Q -function of its own action with period T to adapt to the switched environments. Convergence analysis shows that the optimal joint strategy can be obtained in stochastic games with deterministic state transitions occurring in chronological order. The influence of period T on convergence is studied through a fictitious stochastic game. The efficacy of the A-IQL algorithm is validated through two switched environments-the distributed sensor network (DSN) task and the target transportation task.
What problem does this paper attempt to address?