Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework
Qi Wang,Yongsheng Hao,Jie Cao
DOI: https://doi.org/10.1016/j.engappai.2021.104422
IF: 8
2021-10-01
Engineering Applications of Artificial Intelligence
Abstract:The combinatorial optimization (CO) problems on the graph are the core and classic problems in artificial intelligence (AI) and operations research (OR). For example, the Vehicle Routing Problem (VRP) and Traveling Salesman Problem (TSP) are fascinating NP-hard problems and have important significance for the existing transportation system. Traditional methods such as heuristics methods, exact algorithms, and solution solvers can already find approximate solutions on small-scale graphs. However, they are helpless for large-scale graphs and other problems with similar structures. Moreover, traditional methods often require artificially designed heuristic functions to aid decision-making. In recent years, more and more work has focused on applying deep learning and reinforcement learning (RL) to learn heuristics, which allows us to learn the internal structure of the graph end-to-end and find the optimal path under the guidance of heuristic rules. However, most of these still need manual assistance, and the RL method used has the problems of low sampling efficiency and small searchable space. This paper proposes a novel framework (called OmegaZero) based on Alphago Zero, which does not prescribe expert experience or label data but is trained through self-play. We divide the learning into two stages: in the first stage, we employ graph attention network (GAT) and GRU to learn node representations and memory history trajectories. In the second stage, we employ Monte Carlo tree search (MCTS) and deep RL to search for the solution space and train the model.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary