STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.

Sirui Chen,Zhaowei Zhang,Yaodong Yang,Yali Du
DOI: https://doi.org/10.48550/arxiv.2304.07520
2023-01-01
Abstract:Centralized Training with Decentralized Execution (CTDE) has been proven tobe an effective paradigm in cooperative multi-agent reinforcement learning(MARL). One of the major challenges is credit assignment, which aims to creditagents by their contributions. While prior studies have shown great success,their methods typically fail to work in episodic reinforcement learningscenarios where global rewards are revealed only at the end of the episode.They lack the functionality to model complicated relations of the delayedglobal reward in the temporal dimension and suffer from inefficiencies. Totackle this, we introduce Spatial-Temporal Attention with Shapley (STAS), anovel method that learns credit assignment in both temporal and spatialdimensions. It first decomposes the global return back to each time step, thenutilizes the Shapley Value to redistribute the individual payoff from thedecomposed global reward. To mitigate the computational complexity of theShapley Value, we introduce an approximation of marginal contribution andutilize Monte Carlo sampling to estimate it. We evaluate our method on an Alice Bob example and MPE environments across different scenarios. Our resultsdemonstrate that our method effectively assigns spatial-temporal credit,outperforming all state-of-the-art baselines.
What problem does this paper attempt to address?