Abstract:We propose a model-free reinforcement learning architecture, called distributed attentional actor architecture after conditional attention (DA6-X), to provide better interpretability of conditional coordinated behaviors. The underlying principle involves reusing the saliency vector, which represents the conditional states of the environment, such as the global position of agents. Hence, agents with DA6-X flexibility built into their policy exhibit superior performance by considering the additional information in the conditional states during the decision-making process. The effectiveness of the proposed method was experimentally evaluated by comparing it with conventional methods in an objects collection game. By visualizing the attention weights from DA6-X, we confirmed that agents successfully learn situation-dependent coordinated behaviors by correctly identifying various conditional states, leading to improved interpretability of agents along with superior performance.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the interpretability problem of conditional coordinated behaviors in multi - agent reinforcement learning (MARL). Specifically, the author proposes a new model named Distributed Attentional Actor Architecture after Conditional Attention (DA6 - X) to improve the understanding and interpretability of conditional coordinated behaviors. #### Main problems: 1. **Limitations of existing methods**: The existing interpretability research in multi - agent systems (MAS) mainly focuses on centralized systems, and these methods cannot fully explain the black - box coordination mechanisms of decentralized systems. Moreover, although some methods can learn conditional behaviors, they fail to ensure the interpretability of the learning results, making it difficult for agents to obtain the expected conditional behaviors. 2. **Impact of conditional states**: In multi - agent systems, the behavior of agents must be flexibly adjusted according to the environment and the states of other agents. However, existing methods have deficiencies in dealing with this conditional dependence, resulting in limited improvement in interpretability and performance. #### Proposed solutions: - **DA6 - X model**: By introducing the Conditional Module (CM), DA6 - X can reuse the saliency vector, thereby enhancing the understanding and interpretation of conditional states. Specifically, DA6 - X combines conditional states and local observations, enabling agents to consider more information in the decision - making process and thus achieve more efficient coordinated behaviors. - **Two - layer data structure**: DA6 - X processes two - layer data: conditional states (such as global position) and local observations. In this way, agents can flexibly adjust their behavior strategies under different conditions and provide explanations for their behaviors. - **Application of attention mechanism**: By extracting the attention weights in the local transformer encoder to generate attention heatmaps, the important parts in the input data that describe conditional aspects can be intuitively shown, which helps to understand how conditional states affect the behavior of agents. ### Summary: The main goal of this paper is to solve the interpretability problem of conditional coordinated behaviors in multi - agent reinforcement learning by proposing the DA6 - X model, especially how agents can flexibly adjust their behavior strategies and provide explanations under different conditional states. This improvement not only improves the performance of agents but also enhances the transparency and interpretability of their decision - making processes.

Interpretability for Conditional Coordinated Behavior in Multi-Agent Reinforcement Learning

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2rl

Distributed Multi-Agent Deep Reinforcement Learning for Robust Coordination against Noise

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

Joint Attention for Multi-Agent Coordination and Social Learning

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

Cascaded Attention: Adaptive and Gated Graph Attention Network for Multiagent Reinforcement Learning

Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning.

Learning Multi-Agent Communication with Double Attentional Deep Reinforcement Learning

N$\text{A}^\text{2}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning

Active Legibility in Multiagent Reinforcement Learning

Complementary Attention for Multi-Agent Reinforcement Learning.

An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments

Do Deep Reinforcement Learning Agents Model Intentions?

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention

AVD-Net: Attention Value Decomposition Network for Deep Multi-Agent Reinforcement Learning