Abstract:Recent work applying deep reinforcement learning (DRL) to solve traveling salesman problems (TSP) has shown that DRL-based solvers can be fast and competitive with TSP heuristics for small instances, but do not generalize well to larger instances. In this work, we propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method. Our architecture, which integrates a multilayer perceptron, a graph neural network, and an attention model, defines a stochastic policy that sequentially generates a TSP solution. Our training method includes several innovations: (1) we interleave DRL policy gradient updates with local search (using a new local search technique), (2) we use a novel simple baseline, and (3) we apply curriculum learning. Finally, we empirically demonstrate that MAGIC is superior to other DRL-based methods on random TSP instances, both in terms of performance and generalizability. Moreover, our method compares favorably against TSP heuristics and other state-of-the-art approach in terms of performance and computational time.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when using deep reinforcement learning (DRL) to solve the Traveling Salesman Problem (TSP), although existing DRL solvers perform well and are fast on small - scale instances, they have poor generalization ability when dealing with large - scale instances. Specifically, the paper points out that although DRL methods can quickly generate paths close to the optimal solution in small - scale TSP instances, when the instance scale increases, the performance of these methods drops significantly and they cannot effectively generalize to larger problem scales. To overcome this challenge, the authors propose a new DRL method called MAGIC (Multilayer Perceptron, Attention, Graph Neural Network, Interleaved local search, and Curriculum Learning). MAGIC improves the performance and generalization ability of DRL solvers through the following innovations: 1. **Model Architecture**: MAGIC adopts a deep - learning architecture that combines a Multilayer Perceptron (MLP), a Graph Neural Network (GNN), and an Attention Mechanism (Attention Model) to define a stochastic policy that can generate TSP solutions sequentially. 2. **Training Method**: - **Alternating Policy Gradient Updates and Local Search**: During the training process, MAGIC alternates DRL policy - gradient updates and local search, where the local search uses a new technique. - **Simple Baseline**: A novel simple baseline - the Policy Rollout Baseline - is used to reduce the variance of policy - gradient estimates. - **Curriculum Learning**: Curriculum Learning is applied to assist the training process and improve the model's generalization ability. 3. **Experimental Verification**: Through a large number of experiments, the paper shows that MAGIC outperforms other DRL - based methods on randomly generated TSP instances, not only in performance but also in generalization ability. In addition, MAGIC is also competitive in performance and computation time compared with traditional TSP heuristic algorithms and other state - of - the - art methods. In summary, the main goal of this paper is to improve the generalization ability and overall performance of DRL - based TSP solvers on large - scale instances by proposing the MAGIC method.

Improving Generalization of Deep Reinforcement Learning-based TSP Solvers

Uniformity of Markov Elements in Deep Reinforcement Learning for Traffic Signal Control

Generalization in Deep RL for TSP Problems via Equivariance and Local Search

A Deep Reinforcement Learning Based Real-Time Solution Policy for the Traveling Salesman Problem

A Deep Reinforcement Learning Agent for Geometry Online Tutoring

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers

A Reinforcement Learning Approach for Optimizing Multiple Traveling Salesman Problems over Graphs

DAN: Decentralized Attention-based Neural Network for the MinMax Multiple Traveling Salesman Problem

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Learning Collaborative Policies to Solve NP-hard Routing Problems

Reinforcement Learning-Based Nonautoregressive Solver for Traveling Salesman Problems

Deep Reinforcement Learning for Large-Scale TSP Graph

A deep reinforcement learning with dynamic spatio-temporal graph model for solving urban logistics delivery planning problems

An Efficient Hybrid Graph Network Model for Traveling Salesman Problem with Drone

Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop Scheduling

A deep reinforcement learning approach for solving the Traveling Salesman Problem with Drone

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

A deep reinforcement learning algorithm framework for solving multi-objective traveling salesman problem based on feature transformation

Neural TSP Solver with Progressive Distillation.