Solving NP-Hard Problems on Graphs with Extended AlphaGo Zero

Kenshin Abe,Zijian Xu,Issei Sato,Masashi Sugiyama
DOI: https://doi.org/10.48550/arXiv.1905.11623
2020-03-07
Abstract:There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. proposed an end-to-end reinforcement learning framework, S2V-DQN, which automatically learns graph embeddings to construct solutions to a wide range of problems. To improve the generalization ability of their Q-learning method, we propose a novel learning strategy based on AlphaGo Zero which is a Go engine that achieved a superhuman level without the domain knowledge of the game. Our framework is redesigned for combinatorial problems, where the final reward might take any real number instead of a binary response, win/lose. In experiments conducted for five kinds of NP-hard problems including {\sc MinimumVertexCover} and {\sc MaxCut}, our method is shown to generalize better to various graphs than S2V-DQN. Furthermore, our method can be combined with recently-developed graph neural network (GNN) models such as the \emph{Graph Isomorphism Network}, resulting in even better performance. This experiment also gives an interesting insight into a suitable choice of GNN models for each task.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use the extended AlphaGo Zero method on graphs to solve NP - hard combinatorial optimization problems. Specifically, the paper proposes a new learning strategy named CombOpt Zero, aiming to overcome the problem of poor generalization ability of existing methods on graphs with different characteristics. This method is based on the reinforcement learning framework of AlphaGo Zero and trains deep neural networks through Monte Carlo Tree Search (MCTS) to solve a variety of NP - hard problems, including MINIMUM VERTEX COVER, MAXCUT and MAXIMUM CLIQUE. CombOpt Zero solves the challenge of state - value prediction in combinatorial optimization problems by introducing the reward normalization technique and shows better generalization performance on different graph structures. The main contributions of the paper include: 1. **Proposing CombOpt Zero**: A new learning strategy for solving combinatorial optimization problems on graphs, especially having better generalization ability on different types of graphs. 2. **Reward normalization technique**: Solves the scale problem of state - value prediction in combinatorial optimization problems, enabling the algorithm to better adapt to problems of different scales and complexities. 3. **Experimental verification**: Through experiments on a variety of NP - hard problems, it is proved that CombOpt Zero has better generalization performance on different graph structures compared with the existing S2V - DQN method. These contributions not only promote machine - learning solutions to combinatorial optimization problems, but also provide new ideas and technical means for future research.