Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

Feng Huang,Ming Cao,Long Wang
DOI: https://doi.org/10.1109/tac.2024.3510596
IF: 6.549
2024-01-01
IEEE Transactions on Automatic Control
Abstract:In stochastic dynamic environments, multi-agent Markov decision processes have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multi-agent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To investigate the sensitivity of optimal policies to these uncertain parameters, we study a robust stochastic control problem of multi-agent Markov decision processes where all agents constitute a centralized controller whose goal is to seek a maximal long-term return of all agents and the uncertainty plays a role of disturbance for achieving this goal, and provide a solution concept of robust team optimality for decisions of all agents. To seek such a solution, we develop a robust iterative learning algorithm of policies for all agents and present its convergence analysis. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate required computational resources. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by extending the model of sequential social dilemmas to uncertain scenarios.
What problem does this paper attempt to address?