Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

Zefang Zong,Tong Xia,Meng Zheng,Yong Li

DOI: https://doi.org/10.1145/3625232

IF: 5

2024-01-25

ACM Transactions on Intelligent Systems and Technology

Abstract:Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \(11.7\% \) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.

computer science, information systems, artificial intelligence

What problem does this paper attempt to address?

This paper aims to solve the Vehicle Routing Problem with Time Window (VRPTW). VRPTW is an important problem widely applied in logistics services and real-life scenarios, such as online food delivery and ride-hailing platforms. The core of the problem is to find optimal vehicle routes that serve each customer within specific time windows while minimizing the total travel distance. Existing methods, such as heuristic and meta-heuristic algorithms, provide approximate solutions but are not efficient enough to handle real-time response requirements. The paper proposes an end-to-end framework based on Reinforcement Learning (RL) to solve VRPTW. The main innovations include: 1. Designing an agent model that encodes time window constraints as feature inputs and enforces strict policies during output to generate deterministic results. 2. Introducing time penalty reinforcement rewards to simulate time window restrictions during gradient propagation. 3. Designing a task handler that allows collaboration among different vehicles. Experimental results show that this method outperforms other RL baseline methods by 11.7% in terms of performance and can generate solutions for instances within a few seconds, while traditional heuristic methods take several minutes. Furthermore, this solution has real-time response capabilities and strong adaptability to newly emerged business demands.

Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

Multi-Vehicle Routing Problems with Soft Time Windows: A Multi-Agent Reinforcement Learning Approach

Deep Reinforcement Learning Algorithm for Fast Solutions to Vehicle Routing Problem with Time-Windows

Reinforcement Learning for Solving Stochastic Vehicle Routing Problem with Time Windows

Deep Reinforcement Learning for Solving Vehicle Routing Problems With Backhauls

Fast Approximate Solutions using Reinforcement Learning for Dynamic Capacitated Vehicle Routing with Time Windows

Logistics Distribution Route Optimization With Time Windows Based on Multi-Agent Deep Reinforcement Learning

Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows

Graph attention reinforcement learning with flexible matching policies for multi-depot vehicle routing problems

Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

EFECTIW-ROTER: Deep Reinforcement Learning Approach for Solving Heterogeneous Fleet and Demand Vehicle Routing Problem with Time-Window Constraints

A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems

SmartPathfinder: Pushing the Limits of Heuristic Solutions for Vehicle Routing Problem with Drones Using Reinforcement Learning

Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem

A multi-agent deep reinforcement learning approach for solving the multi-depot vehicle routing problem

Solving the Vehicle Routing Problem with Stochastic Travel Cost Using Deep Reinforcement Learning

A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints

Solving the Vehicle Routing Problem with Time Windows Using Modified Rat Swarm Optimization Algorithm Based on Large Neighborhood Search

Deep Reinforcement Learning for Multi-Truck Vehicle Routing Problems with Multi-Leg Demand Routes

Edge-DIRECT: A Deep Reinforcement Learning-based Method for Solving Heterogeneous Electric Vehicle Routing Problem with Time Window Constraints

Improved Multi-Agent System for the Vehicle Routing Problem with Time Windows