A Multitier Reinforcement Learning Model for a Cooperative Multiagent System

Haobin Shi,Liangjing Zhai,Haibo Wu,Maxwell Hwang,Kao-Shing Hwang,Hsuan-Pei Hsu
DOI: https://doi.org/10.1109/tcds.2020.2970487
IF: 4.546
2020-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:In multiagent cooperative systems with value-based reinforcement learning, agents learn how to complete the task by an optimal policy learned through value-policy improvement iterations. But how to design a policy that avoids cooperation dilemmas and comes to a common consensus between agents is an important issue. A method that improves the coordination ability of agents in cooperative systems by assessing the cooperative tendency and increases the collective payoff by candidate policy is proposed in this article. The method learns the cooperative rules by recording the cooperation probabilities for agents in a multitier reinforcement learning model. The candidate action sets are selected through the candidate policy which considers the payoff of the coalition. Then, the optimal strategy is selected through the Nash bargaining solution (NBS) from these candidate action sets. The method is tested using two cooperative tasks. The results show that the proposed algorithm, which addresses the instability and ambiguity in a win or learning fast policy hill-climbing (WoLF-PHC) and requires significantly less memory space than the NBS, is more stable and more efficient than other methods.
What problem does this paper attempt to address?