Abstract:The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and acting in a shared multi-agent environment? Observing that humans often provide incentives to influence others' behavior, we propose to equip each RL agent in a multi-agent environment with the ability to give rewards directly to other agents, using a learned incentive function. Each agent learns its own incentive function by explicitly accounting for its impact on the learning of recipients and, through them, the impact on its own extrinsic objective. We demonstrate in experiments that such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games, often by finding a near-optimal division of labor. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **In a multi - agent environment, how can agents that learn and act independently influence the behavior of other agents by providing rewards, so as to achieve better cooperation and collective performance?** Specifically, the paper focuses on how, in multi - agent reinforcement learning (MARL), each agent can learn to provide rewards to other agents to encourage them to take actions conducive to the overall goal. ### Problem Background Traditional reinforcement learning (RL) mainly focuses on single - agent settings, that is, an agent learns the optimal policy by maximizing a predefined external reward function. However, in a multi - agent environment, multiple agents may have not completely consistent goals and need to continuously learn and interact in a shared environment. In this case, how to ensure that these agents can cooperate effectively while optimizing their respective individual goals is a long - standing challenge. ### Solution Proposed in the Paper To solve this problem, the paper proposes a new method, called **Learning to Incentivize Others (LIO)**. The core idea of this method is to allow each agent to learn an incentive function, through which it directly provides rewards to other agents, thereby influencing their behavior. Specifically: 1. **Learning of the Incentive Function**: Each agent not only learns its own strategy but also learns how to give rewards according to the behavior of other agents. The parameters of the incentive function are updated by the gradient - ascent method, with the goal of minimizing the impact on its own external rewards. 2. **Learning Coupling across Agents**: The learning processes among agents are coupled because an agent's incentive function will influence the learning of other agents and, in turn, indirectly affect its own performance. 3. **Online Cross - Validation**: To capture the delay of the incentive effect, the paper introduces the method of online cross - validation to ensure that agents can gradually optimize their incentive strategies in multiple rounds of iteration. ### Experimental Results The paper verifies the effectiveness of LIO through multiple experiments, including: - **Iterated Prisoner's Dilemma (IPD)**: Two LIO agents can converge to a state of reciprocal cooperation. - **N - Player Escape Room Game (ER)**: LIO agents can find the optimal division of labor in complex cooperation tasks, making the collective return close to the theoretical optimal value. - **Cleanup Game**: LIO agents can perform excellently in resource management and cooperation, outperforming other baseline methods. ### Summary In general, this paper solves the problem of cooperation in multi - agent environments by introducing an incentive mechanism among agents, demonstrating that learning incentive functions can significantly improve the cooperation ability and collective performance of agents in complex tasks. This provides new ideas and methods for building more efficient and cooperative multi - agent systems in the future.

Learning to Incentivize Other Learning Agents

MotiLearn: Contract-Based Incentive Mechanism for Heterogeneous Edge Collaborative Training

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning.

Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning

LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning

Innate-Values-driven Reinforcement Learning for Cooperative Multi-Agent Systems

Prosocial learning agents solve generalized Stag Hunts better than selfish ones

Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model

Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

Incentivized Learning in Principal-Agent Bandit Games

Reinforcement learning for encouraging cooperation in a multiagent system

Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

Learning to Steer Markovian Agents under Model Uncertainty

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Multi-Agent Incentive Communication via Decentralized Teammate Modeling

Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning

Adaptive Incentive Design with Learning Agents