ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations

Alexander R. Trott,Caiming Xiong,Stephan Zheng,Eric Zhao
Abstract:Multi-agent simulations provide a scalable environment for learning policies that interact with rational agents. However, such policies may fail to generalize to the real-world where agents may differ from simulated counterparts due to unmodeled irrationality and misspecified reward functions. We introduce (cid:15) -Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multi-agent sim-to-real gaps. While existing notions of multi-agent robustness concern perturbations in the actions of agents, we address a novel robustness objective concerning perturbations in the reward functions of agents. ERMAS provides this robustness by anticipating suboptimal behaviors from other agents, formalized as the worst-case (cid:15) -equilibrium . We show empirically that ERMAS yields robust policies for repeated bimatrix games and optimal taxation problems in economic simulations. In particular, in the two-level RL problem posed by the AI Economist (Zheng et al., 2020) ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complex spatiotemporal simulations.
Computer Science
What problem does this paper attempt to address?