Abstract:In human societies, people often incorporate fairness in their decisions and treat reciprocally by being kind to those who act kindly. They evaluate the kindness of others' actions not only by monitoring the outcomes but also by considering the intentions. This behavioral concept can be adapted to train cooperative agents in Multi-Agent Reinforcement Learning (MARL). We propose the KindMARL method, where agents' intentions are measured by counterfactual reasoning over the environmental impact of the actions that were available to the agents. More specifically, the current environment state is compared with the estimation of the current environment state provided that the agent had chosen another action. The difference between each agent's reward, as the outcome of its action, with that of its fellow, multiplied by the intention of the fellow is then taken as the fellow's "kindness". If the result of each reward-comparison confirms the agent's superiority, it perceives the fellow's kindness and reduces its own reward. Experimental results in the Cleanup and Harvest environments show that training based on the KindMARL method enabled the agents to earn 89\% (resp. 37\%) and 44% (resp. 43\%) more total rewards than training based on the Inequity Aversion and Social Influence methods. The effectiveness of KindMARL is further supported by experiments in a traffic light control problem.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of improving agent cooperation in Multi-Agent Reinforcement Learning (MARL). Specifically, the authors propose a method called **KindMARL** that enhances cooperation among agents by introducing the concept of "kindness." ### Background and Motivation In human society, people often consider fairness when making decisions and treat those who behave kindly with kindness. They evaluate others' kindness not only by observing outcomes but also by considering intentions. This behavioral concept can be applied to train cooperative multi-agent systems. However, traditional multi-agent reinforcement learning methods mainly rely on external rewards and overlook intrinsic motivations and social interactions among agents. ### Method Overview 1. **Intention Measurement**: - Each agent evaluates the intentions of other agents through counterfactual reasoning. Specifically, the current environmental state is compared with the hypothetical state if the agent had chosen different actions. - By comparing each agent's reward with its peers' rewards and multiplying by the peers' intentions, the "kindness" of the peers is calculated. 2. **Reward Adjustment**: - If an agent's reward is higher than its peers', it feels guilty and reduces its own reward; conversely, if its reward is lower than its peers', it feels jealous and reduces its own reward. - In this way, agents can treat themselves and others more fairly, thereby promoting cooperation. ### Experimental Validation The authors conducted experimental validation in the following environments: 1. **Cleanup and Harvest Environments**: - These two environments are public resource management problems that require agents to balance effectively between collecting resources and maintaining the environment. - Experimental results show that agents trained with the KindMARL method significantly outperform those trained with Inequity Aversion (IA) and Social Influence (SI) methods in terms of total rewards. 2. **Traffic Signal Control Problem**: - In this task, agents need to coordinate traffic signals at multiple intersections to minimize vehicle travel time. - Experimental results show that the KindMARL method outperforms the CoLight and IA methods in reducing the average vehicle travel time. ### Conclusion By introducing the concept of "kindness," the KindMARL method significantly improves agent cooperation and overall performance in multi-agent reinforcement learning. Experimental results indicate that the KindMARL method performs excellently in various environments, especially in tasks requiring high cooperation. Future work can further optimize parameter search and extend validation to larger-scale datasets.

Kindness in Multi-Agent Reinforcement Learning

S2rl

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

Mediated Multi-Agent Reinforcement Learning

Promoting Cooperation in Multi-Agent Reinforcement Learning via Mutual Help

Cooperation and Fairness in Multi-Agent Reinforcement Learning

Robust Multi-Agent Reinforcement Learning with Social Empowerment for Coordination and Communication

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Towards a Standardised Performance Evaluation Protocol for Cooperative MARL

Environmental-Impact Based Multi-Agent Reinforcement Learning

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning.

Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation

From Centralized to Self-Supervised: Pursuing Realistic Multi-Agent Reinforcement Learning

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

Reward-Sharing Relational Networks in Multi-Agent Reinforcement Learning as a Framework for Emergent Behavior

Formal contracts mitigate social dilemmas in multi-agent reinforcement learning

Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

On Solving Cooperative MARL Problems with a Few Good Experiences

Learning to Incentivize Other Learning Agents

Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model