Abstract:We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to robustly reconstruct the utility functions of each agent from noisy observation signals in a multi - agent system. Specifically, the researchers hope to ensure that even if the observation data is noisy, they can accurately identify whether these agents are making coordinated decisions and reconstruct their utility functions. ### Problem Background In multi - agent systems, especially in cognitive radar networks or unmanned aerial vehicle (UAV) networks, how to determine whether multiple sensors are working cooperatively (i.e., whether their behaviors are Pareto - optimal) and reconstruct the utility function of each sensor from the observed signals is an important problem. This problem is known as Multi - Agent Inverse Reinforcement Learning (MA - IRL). ### Core Contributions of the Paper 1. **Proposing a Robust MA - IRL Algorithm**: This algorithm constructs a distributionally robust optimization framework based on the Wasserstein distance to minimize the worst - case prediction error. 2. **Theoretical Proof**: It is proved that this robust estimation method is equivalent to a reformulation of a semi - infinite programming problem, and an efficient algorithm is provided to calculate the solution. 3. **Numerical Experiment Verification**: Through numerical experiments, the effectiveness of this robust IRL algorithm is demonstrated, especially when dealing with noisy observation data, it can significantly improve the worst - case reconstruction accuracy. ### Mathematical Formulas - **Wasserstein Distance**: \[ W(Q, P)=\inf_{\pi\in\Pi(Q, P)}\int_{X\times X}\|x - y\|_2\pi(dx, dy) \] where $\Pi(Q, P)$ is the set of all probability distributions with $Q$ and $P$ as marginal distributions. - **Robust Estimation Objective**: \[ \min_{\psi\in\Psi}\sup_{Q\sim B_\epsilon(P_T)}\mathbb{E}_{\Phi\sim Q}[h(\psi, \Phi)] \] where $B_\epsilon(P_T)$ is the set of probability distributions whose 1 - Wasserstein distance from the empirical distribution $P_T$ does not exceed $\epsilon$. ### Summary By introducing the method of distributionally robust optimization, this paper solves the problem of utility function reconstruction in multi - agent systems with noisy observation data, and provides theoretical guarantees and verification of effectiveness in practical applications. This method not only improves the worst - case reconstruction accuracy but also maintains good average performance.

Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing

On Multi-Agent Inverse Reinforcement Learning

Identifying Coordination in a Cognitive Radar Network -- A Multi-Objective Inverse Reinforcement Learning Approach

Identifying Cognitive Radars -- Inverse Reinforcement Learning using Revealed Preferences

Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

Online Observer-Based Inverse Reinforcement Learning

Towards Theoretical Understanding of Inverse Reinforcement Learning

Wasserstein Distributionally Robust Control and State Estimation for Partially Observable Linear Systems

Inverse Reinforcement Learning with Explicit Policy Estimates

A Bayesian Approach to Robust Inverse Reinforcement Learning

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

Inverse Reinforcement Learning with Sub-optimal Experts

Partial Identifiability and Misspecification in Inverse Reinforcement Learning

Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Efficient Sampling-Based Maximum Entropy Inverse Reinforcement Learning with Application to Autonomous Driving

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Meta-Cognition. An Inverse-Inverse Reinforcement Learning Approach for Cognitive Radars