A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests
Fabian Akkerman,Martijn Mes,Willem van Jaarsveld
DOI: https://doi.org/10.1016/j.cie.2024.110747
IF: 7.18
2024-12-02
Computers & Industrial Engineering
Abstract:This paper presents directions for using reinforcement learning with neural networks for dynamic vehicle routing problems (DVRPs). DVRPs involve sequential decision-making under uncertainty where the expected future consequences are ideally included in current decision-making. A frequently used framework for these problems is approximate dynamic programming (ADP) or reinforcement learning (RL), often in conjunction with a parametric value function approximation (VFA). A straightforward way to use VFA in DVRP is linear regression (LVFA), but more complex, non-linear predictors, e.g., neural network VFAs (NNVFA), are also widely used. Alternatively, we may represent the policy directly, using a linear policy function approximation (LPFA) or neural network PFA (NNPFA). The abundance of policies and design choices complicate the use of neural networks for DVRPs in research and practice. We provide a structured overview of the similarities and differences between the policy classes. Furthermore, we present an empirical comparison of LVFA, LPFA, NNVFA, and NNPFA policies. The comparison is conducted on several problem variants of the DVRP with stochastic customer requests. To validate our findings, we study realistic extensions of the stylized problem on (i) a same-day parcel pickup and delivery case in the city of Amsterdam, the Netherlands, and (ii) the routing of robots in an automated storage and retrieval system (AS/RS). Based on our empirical evaluation, we provide insights into the advantages and disadvantages of neural network policies compared to linear policies, and value-based approaches compared to policy-based approaches.
computer science, interdisciplinary applications,engineering, industrial