Reinforcement Learning for the Pickup and Delivery Problem

Fagui Liu,Chengqi Lai,Lvshengbiao Wang
DOI: https://doi.org/10.1007/978-3-031-15931-2_8
2022-01-01
Abstract:The pickup and delivery problem (PDP) and its related variants are an important part in the field of urban logistics and distribution, and there are many heuristic algorithms to solve them. However, with the continuous expansion of logistics scale, these methods generally have the problem of too long calculation time. In order to solve this problem, we propose a reinforcement learning (RL) model based on the Advantage Actor-Critic, which regards PDP as a sequential decision problem. The actor based on the attention mechanism is responsible for generating routing strategies. The critic is designed to improve the solution quality during training. The model is trained using policy gradient. The experimental results show that compared with the heuristic algorithms and previous RL approach, the proposed model has obvious advantages in computational time, and it is also competitive in terms of solution quality.
What problem does this paper attempt to address?