A lightweight CNN-transformer model for learning traveling salesman problems

Minseop Jung,Jaeseung Lee,Jibum Kim
DOI: https://doi.org/10.1007/s10489-024-05603-x
IF: 5.3
2024-06-20
Applied Intelligence
Abstract:Several studies have attempted to solve traveling salesman problems (TSPs) using various deep learning techniques. Among them, Transformer-based models show state-of-the-art performance even for large-scale Traveling Salesman Problems (TSPs). However, they are based on fully-connected attention models and suffer from large computational complexity and GPU memory usage. Our work is the first CNN-Transformer model based on a CNN embedding layer and partial self-attention for TSP. Our CNN-Transformer model is able to better learn spatial features from input data using a CNN embedding layer compared with the standard Transformer-based models. It also removes considerable redundancy in fully-connected attention models using the proposed partial self-attention. Experimental results show that the proposed CNN embedding layer and partial self-attention are very effective in improving performance and computational complexity. The proposed model exhibits the best performance in real-world datasets and outperforms other existing state-of-the-art (SOTA) Transformer-based models in various aspects. Our code is publicly available at https://github.com/cm8908/CNN_Transformer3.
computer science, artificial intelligence
What problem does this paper attempt to address?
This paper is primarily dedicated to solving the Traveling Salesman Problem (TSP) using deep learning techniques, particularly by combining Convolutional Neural Networks (CNN) and Transformer models. Specifically, the research team proposed a lightweight CNN-Transformer model to address TSP. ### Research Background TSP is a classic NP-hard problem that has been widely studied in computer science and operations research. It aims to find the shortest path such that a "traveling salesman" can visit each city exactly once and then return to the starting point. As the number of cities increases, finding the optimal solution becomes very computationally intensive. Therefore, researchers have developed various heuristic and approximation algorithms to find high-quality solutions within a reasonable time frame. ### Solution The paper proposes a novel CNN-Transformer model, which is based on partial self-attention mechanisms and utilizes CNN embedding layers to extract spatial features from the input data. This combination allows the model to better learn the spatial characteristics of the input data and improves computational complexity and GPU memory usage by reducing redundant connections in the fully connected attention model. ### Main Contributions 1. **CNN-Transformer Model**: This is the first CNN-Transformer model used to solve TSP. Experiments show that the CNN embedding layer is very effective in learning local spatial features of various TSP instances. 2. **Partial Self-Attention Mechanism**: The model employs a partial self-attention mechanism that performs attention operations only on the most recently visited nodes, thereby enhancing the ability to learn local combinatorial properties. 3. **Efficiency Improvement**: By removing redundant attention connections in the decoder, the model significantly reduces GPU memory usage and has lower inference time. ### Experimental Results The paper validates the effectiveness of the proposed method through multiple experiments. The experimental results show that the model not only performs best on real-world datasets but also surpasses existing state-of-the-art Transformer-based models on various metrics. Notably, it achieves significant results in terms of optimization gap, average predicted path length, and other aspects. Additionally, the model demonstrates good training time and inference time performance, as well as lower GPU memory consumption. In summary, this paper provides a novel and efficient deep learning framework that can effectively solve large-scale TSP problems and has strong practical value.