Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

Zefang Zong,Tong Xia,Meng Zheng,Yong Li
DOI: https://doi.org/10.1145/3625232
IF: 5
2024-01-25
ACM Transactions on Intelligent Systems and Technology
Abstract:Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \(11.7\% \) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?
This paper aims to solve the Vehicle Routing Problem with Time Window (VRPTW). VRPTW is an important problem widely applied in logistics services and real-life scenarios, such as online food delivery and ride-hailing platforms. The core of the problem is to find optimal vehicle routes that serve each customer within specific time windows while minimizing the total travel distance. Existing methods, such as heuristic and meta-heuristic algorithms, provide approximate solutions but are not efficient enough to handle real-time response requirements. The paper proposes an end-to-end framework based on Reinforcement Learning (RL) to solve VRPTW. The main innovations include: 1. Designing an agent model that encodes time window constraints as feature inputs and enforces strict policies during output to generate deterministic results. 2. Introducing time penalty reinforcement rewards to simulate time window restrictions during gradient propagation. 3. Designing a task handler that allows collaboration among different vehicles. Experimental results show that this method outperforms other RL baseline methods by 11.7% in terms of performance and can generate solutions for instances within a few seconds, while traditional heuristic methods take several minutes. Furthermore, this solution has real-time response capabilities and strong adaptability to newly emerged business demands.