Improving Generalization of Deep Reinforcement Learning-based TSP Solvers

Wenbin Ouyang,Yisen Wang,Shaochen Han,Zhejian Jin,Paul Weng
DOI: https://doi.org/10.48550/arXiv.2110.02843
2021-10-06
Abstract:Recent work applying deep reinforcement learning (DRL) to solve traveling salesman problems (TSP) has shown that DRL-based solvers can be fast and competitive with TSP heuristics for small instances, but do not generalize well to larger instances. In this work, we propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method. Our architecture, which integrates a multilayer perceptron, a graph neural network, and an attention model, defines a stochastic policy that sequentially generates a TSP solution. Our training method includes several innovations: (1) we interleave DRL policy gradient updates with local search (using a new local search technique), (2) we use a novel simple baseline, and (3) we apply curriculum learning. Finally, we empirically demonstrate that MAGIC is superior to other DRL-based methods on random TSP instances, both in terms of performance and generalizability. Moreover, our method compares favorably against TSP heuristics and other state-of-the-art approach in terms of performance and computational time.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when using deep reinforcement learning (DRL) to solve the Traveling Salesman Problem (TSP), although existing DRL solvers perform well and are fast on small - scale instances, they have poor generalization ability when dealing with large - scale instances. Specifically, the paper points out that although DRL methods can quickly generate paths close to the optimal solution in small - scale TSP instances, when the instance scale increases, the performance of these methods drops significantly and they cannot effectively generalize to larger problem scales. To overcome this challenge, the authors propose a new DRL method called MAGIC (Multilayer Perceptron, Attention, Graph Neural Network, Interleaved local search, and Curriculum Learning). MAGIC improves the performance and generalization ability of DRL solvers through the following innovations: 1. **Model Architecture**: MAGIC adopts a deep - learning architecture that combines a Multilayer Perceptron (MLP), a Graph Neural Network (GNN), and an Attention Mechanism (Attention Model) to define a stochastic policy that can generate TSP solutions sequentially. 2. **Training Method**: - **Alternating Policy Gradient Updates and Local Search**: During the training process, MAGIC alternates DRL policy - gradient updates and local search, where the local search uses a new technique. - **Simple Baseline**: A novel simple baseline - the Policy Rollout Baseline - is used to reduce the variance of policy - gradient estimates. - **Curriculum Learning**: Curriculum Learning is applied to assist the training process and improve the model's generalization ability. 3. **Experimental Verification**: Through a large number of experiments, the paper shows that MAGIC outperforms other DRL - based methods on randomly generated TSP instances, not only in performance but also in generalization ability. In addition, MAGIC is also competitive in performance and computation time compared with traditional TSP heuristic algorithms and other state - of - the - art methods. In summary, the main goal of this paper is to improve the generalization ability and overall performance of DRL - based TSP solvers on large - scale instances by proposing the MAGIC method.