Provable distributed adaptive temporal-difference learning over time-varying networks
Junlong Zhu,Bing Li,Lin Wang,Mingchuan Zhang,Ling Xing,Jiangtao Xi,Qingtao Wu
DOI: https://doi.org/10.1016/j.eswa.2023.120406
IF: 8.5
2023-05-18
Expert Systems with Applications
Abstract:Multi-agent reinforcement learning (MARL) has been successfully applied in many fields. In MARL, the policy evaluation problem is one of crucial problems. In order to solve this problem, distributed Temporal-Difference (TD) learning algorithm is one of the most popular methods in a cooperative manner. Despite its empirical success, however, the theory of the adaptive variant of distributed TD learning still remain limited. To fill this gap, we propose an adaptive distributed temporal-difference algorithm (referred to as MS-ADTD ) under Markovian sampling over time-varying networks. Furthermore, we rigorously analyze the convergence of MS-ADTD , the theoretical results show that the local estimation can converge linearly to the optimal neighborhood. Meanwhile, the theoretical results are verified by simulation experiments.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science