Cooperative Multi-Agent Reinforcement Learning in Express System

Yexin Li,Yu Zheng,Qiang Yang
DOI: https://doi.org/10.1145/3340531.3411871
2020-01-01
Abstract:Express systems are widely deployed in many major cities. One type of important tasks in the system is to pick up packages from customers in time. As pick-up requests come in real time and there are many couriers picking up packages, how to dispatch couriers to ensure the cooperation among them and to complete more pick-up tasks in a long time, is very important but challenging. In this paper, we propose a reinforcement learning based framework to learn courier dispatching policies. At first, we divide the city into independent regions, inner each of which a constant number of couriers pick up packages at the same time. Besides reducing problem complexity, city division has practical operation benefits. Afterwards, we focus on each region separately. For each region, we propose a Cooperative Multi-Agent Reinforcement Learning model, i.e. CMARL, to learn the optimal courier dispatching policy in it. CMARL tries to maximize the total number of completed pick-up tasks by all couriers in a long time. Our model achieves this target by combining two Markov Decision Processes, one to guarantee the cooperation among couriers, and the other one to ensure the long-term optimization. After obtaining the value functions of these two MDPs, a new value function is designed to trade off them, based on which we can infer the courier dispatching policy. Experiments based on real-world road network data and historical express data from Beijing are conducted, to confirm the superiority of our model compared with nine baselines.
What problem does this paper attempt to address?