Abstract:The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant benefit of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, leveraging DRL, we can train the policy network to optimize the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

A Response Surface Model Approach to Parameter Estimation of Reinforcement Learning for the Travelling Salesman Problem

Reinforcement learning for the traveling salesman problem with refueling

Spatial-temporal Pricing for Ride-Sourcing Platform with Reinforcement Learning

Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problem

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

A Deep Reinforcement Learning Based Real-Time Solution Policy for the Traveling Salesman Problem

Tuning of reinforcement learning parameters applied to SOP using the Scott–Knott method

How to Evaluate Machine Learning Approaches for Combinatorial Optimization: Application to the Travelling Salesman Problem

Improving Generalization of Deep Reinforcement Learning-based TSP Solvers

Linear Function Approximation as a Resource Efficient Method to Solve the Travelling Salesman Problem

Pareto Frontier Approximation Network (PA-Net) to Solve Bi-objective TSP

Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem

Adaptive Selection of Informative Path Planning Strategies via Reinforcement Learning

PTDRL: Parameter Tuning using Deep Reinforcement Learning

Combining Reinforcement Learning and Optimal Transport for the Traveling Salesman Problem

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Generalization in Deep RL for TSP Problems via Equivariance and Local Search

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

Deep Reinforcement Learning for Traveling Purchaser Problems

CARSS: Cooperative Attention-guided Reinforcement Subpath Synthesis for Solving Traveling Salesman Problem

Distributed Adaptive Reinforcement Learning: A Method for Optimal Routing