Offline Policy Reuse-Guided Anytime Online Collective Multiagent Planning and Its Application to Mobility-on-demand Systems

Wanyuan Wang,Qian Che,Yifeng Zhou,Weiwei Wu,Bo An,Yichuan Jiang
DOI: https://doi.org/10.1007/s10458-024-09650-z
2024-01-01
Autonomous Agents and Multi-Agent Systems
Abstract:The popularity of mobility-on-demand (MoD) systems boosts online collective multiagent planning (Online_CMP), where spatially distributed servicing agents are planned to meet dynamically arriving demands. For city-scale MoDs with a fleet of agents, Online_CMP methods must make a tradeoff between computation time (i.e., real-time) and solution quality (i.e., the number of demands served). Directly using an offline policy can guarantee real-time, but cannot be dynamically adjusted to real agent and demand distributions. Search-based online planning methods are adaptive, but are computationally expensive and cannot scale up. In this paper, we propose a principled Online_CMP method, which reuses and improves the offline policy in an anytime manner. We first model MoDs as a collective Markov Decision Process ( ℂ -MDP) where the collective behavior of agents affects the joint reward. Given the ℂ -MDP model, we propose a novel state value function to evaluate the policy, and a gradient ascent (GA) technique to improve the policy. We further show that offline GA-based policy iteration (GA-PI) can converge to global optima of ℂ -MDP under certain conditions. Finally, with real-time information, the offline policy is used as the default plan, GA-PI is used to improve it and generate an online plan. Experimental results show that our offline policy reuse-guided Online_CMP method significantly outperforms standard online multiagent planning methods on MoD systems like ride-sharing and security traffic patrolling in terms of computation time and solution quality.
What problem does this paper attempt to address?