Multi-Objective Distributional Reinforcement Learning for Large-Scale Order Dispatching.

Fan Zhou,Chenfan Lu,Xiaocheng Tang,Fan Zhang,Zhiwei Qin,Jieping Ye,Hongtu Zhu
DOI: https://doi.org/10.1109/icdm51629.2021.00202
2021-01-01
Abstract:The aim of this paper is to develop a multi-objective distributional reinforcement learning framework for improving order dispatching on large-scale ride-hailing platforms. Compared with traditional RL-based approaches that focus on drivers’ income, the proposed framework also accounts for the spatiotemporal difference between the supply and demand networks. Specifically, we model the dispatching problem as a two-objective Semi-Markov Decision Process (SMDP) and estimate the relative importance of the two objectives under some unknown existing policy via Inverse Reinforcement Learning (IRL). Then, we combine Implicit Quantile Networks (IQN) with the traditional Deep Q-Networks (DQN) to jointly learn the two return distributions and adjusting their weights to refine the old policy through on-line planning and achieve a higher supply-demand coherence of the platform. We conduct large-scale dispatching experiments to demonstrate the remarkable improvement of proposed approach on the platform’s efficiency.
What problem does this paper attempt to address?