Joint Optimization of Pricing, Dispatching and Repositioning in Ride-Hailing with Multiple Models Interplayed Reinforcement Learning

Zhongyun Zhang,Lei Yang,Jiajun Yao,Chao Ma,Jianguo Wang
DOI: https://doi.org/10.1109/tkde.2024.3464563
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Popular ride-hailing products, such as DiDi, Uber and Lyft, provide people with transportation convenience. Pricing, order dispatching and vehicle repositioning are three tasks with tight correlation and complex interactions in ride-hailing platforms, significantly impacting each other's decisions and demand distribution or supply distribution. However, no past work considered combining the three tasks to improve platform efficiency. In this paper, we exploit to optimize pricing, dispatching and repositioning strategies simultaneously. Such a new multi-stage decision-making problem is quite challenging because it involves complex coordination and lacks a unified problem model. To address this problem, we propose a novel J oint optimization framework of P ricing, D ispatching and R epositioning (JPDR) integrating contextual bandit and multi-agent deep reinforcement learning. JPDR consists of two components, including a Soft Actor-Critic (SAC)-based centralized policy for dispatching and repositioning and a pricing strategy learned by a multi-armed contextual bandit algorithm based on the feedback from the former. The two components learn in a mutually guided way to achieve joint optimization because their updates are highly interdependent. Based on real-world data, we implement a realistic environment simulator. Extensive experiments conducted on it show our method outperforms state-of-the-art baselines in terms of both gross merchandise volume and success rate.
What problem does this paper attempt to address?