Abstract:We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system. Specifically, we construct utility estimators which minimize the worst-case prediction error over a Wasserstein ambiguity set centered at noisy signal observations. We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions. We illustrate the efficacy of this robust IRL scheme in numerical studies to reconstruct the utility functions of a cognitive radar network from observed tracking signals.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to robustly reconstruct the utility functions of each agent from noisy observation signals in a multi - agent system. Specifically, the researchers hope to ensure that even if the observation data is noisy, they can accurately identify whether these agents are making coordinated decisions and reconstruct their utility functions.
### Problem Background
In multi - agent systems, especially in cognitive radar networks or unmanned aerial vehicle (UAV) networks, how to determine whether multiple sensors are working cooperatively (i.e., whether their behaviors are Pareto - optimal) and reconstruct the utility function of each sensor from the observed signals is an important problem. This problem is known as Multi - Agent Inverse Reinforcement Learning (MA - IRL).
### Core Contributions of the Paper
1. **Proposing a Robust MA - IRL Algorithm**: This algorithm constructs a distributionally robust optimization framework based on the Wasserstein distance to minimize the worst - case prediction error.
2. **Theoretical Proof**: It is proved that this robust estimation method is equivalent to a reformulation of a semi - infinite programming problem, and an efficient algorithm is provided to calculate the solution.
3. **Numerical Experiment Verification**: Through numerical experiments, the effectiveness of this robust IRL algorithm is demonstrated, especially when dealing with noisy observation data, it can significantly improve the worst - case reconstruction accuracy.
### Mathematical Formulas
- **Wasserstein Distance**:
\[
W(Q, P)=\inf_{\pi\in\Pi(Q, P)}\int_{X\times X}\|x - y\|_2\pi(dx, dy)
\]
where $\Pi(Q, P)$ is the set of all probability distributions with $Q$ and $P$ as marginal distributions.
- **Robust Estimation Objective**:
\[
\min_{\psi\in\Psi}\sup_{Q\sim B_\epsilon(P_T)}\mathbb{E}_{\Phi\sim Q}[h(\psi, \Phi)]
\]
where $B_\epsilon(P_T)$ is the set of probability distributions whose 1 - Wasserstein distance from the empirical distribution $P_T$ does not exceed $\epsilon$.
### Summary
By introducing the method of distributionally robust optimization, this paper solves the problem of utility function reconstruction in multi - agent systems with noisy observation data, and provides theoretical guarantees and verification of effectiveness in practical applications. This method not only improves the worst - case reconstruction accuracy but also maintains good average performance.