POAQL: A Partially Observable Altruistic Q-Learning Method for Cooperative Multi-Agent Reinforcement Learning

Tao Lesong,Kang Miao,Dong Jinpeng,Zhang Songyi,Ye Ke,Chen Shitao,Zheng Nanning
DOI: https://doi.org/10.1109/icra57147.2024.10610745
2024-01-01
Abstract:Multi-Agent Path Finding (MAPF) is an important issue in multi-agent cooperation. Many studies apply MultiAgent Reinforcement Learning (MARL) to solve MAPF in partially observable settings. The objective of cooperative MARL is to maximize the cumulative team reward. Nevertheless, in partially observable settings, the team reward is misleading due to unpredictable factors from the behavior and state of unobserved agents. To address this issue, we propose a Partially Observable Altruistic Q-learning (POAQL) method. POAQL considers the cumulative reward of the observed subteam instead of the whole team, where Altruistic Q-learning plays an important role in learning the subteam action value. In addition, we design a new conflict resolution without additional guidance to emphasize the cooperative nature of MARL frameworks. Experimental results show that POAQL outperforms existing reinforcement learning methods in terms of efficiency and performance.
What problem does this paper attempt to address?