Abstract:There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. proposed an end-to-end reinforcement learning framework, S2V-DQN, which automatically learns graph embeddings to construct solutions to a wide range of problems. To improve the generalization ability of their Q-learning method, we propose a novel learning strategy based on AlphaGo Zero which is a Go engine that achieved a superhuman level without the domain knowledge of the game. Our framework is redesigned for combinatorial problems, where the final reward might take any real number instead of a binary response, win/lose. In experiments conducted for five kinds of NP-hard problems including {\sc MinimumVertexCover} and {\sc MaxCut}, our method is shown to generalize better to various graphs than S2V-DQN. Furthermore, our method can be combined with recently-developed graph neural network (GNN) models such as the \emph{Graph Isomorphism Network}, resulting in even better performance. This experiment also gives an interesting insight into a suitable choice of GNN models for each task.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to use the extended AlphaGo Zero method on graphs to solve NP - hard combinatorial optimization problems. Specifically, the paper proposes a new learning strategy named CombOpt Zero, aiming to overcome the problem of poor generalization ability of existing methods on graphs with different characteristics. This method is based on the reinforcement learning framework of AlphaGo Zero and trains deep neural networks through Monte Carlo Tree Search (MCTS) to solve a variety of NP - hard problems, including MINIMUM VERTEX COVER, MAXCUT and MAXIMUM CLIQUE. CombOpt Zero solves the challenge of state - value prediction in combinatorial optimization problems by introducing the reward normalization technique and shows better generalization performance on different graph structures. The main contributions of the paper include: 1. **Proposing CombOpt Zero**: A new learning strategy for solving combinatorial optimization problems on graphs, especially having better generalization ability on different types of graphs. 2. **Reward normalization technique**: Solves the scale problem of state - value prediction in combinatorial optimization problems, enabling the algorithm to better adapt to problems of different scales and complexities. 3. **Experimental verification**: Through experiments on a variety of NP - hard problems, it is proved that CombOpt Zero has better generalization performance on different graph structures compared with the existing S2V - DQN method. These contributions not only promote machine - learning solutions to combinatorial optimization problems, but also provide new ideas and technical means for future research.

Solving NP-Hard Problems on Graphs with Extended AlphaGo Zero

Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework

Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time

Enhancing Chess Reinforcement Learning with Graph Representation

AlphaZero Gomoku

Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN

Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

Decision-focused Graph Neural Networks for Combinatorial Optimization

Combinatorial Optimization with Automated Graph Neural Networks

A Graph-Neural-Network-Powered Solver Framework for Graph Optimization Problems

Improved Feature Learning: A Maximum-Average-Out Deep Neural Network for the Game Go

Assessing and Enhancing Graph Neural Networks for Combinatorial Optimization: Novel Approaches and Application in Maximum Independent Set Problems

Graph Q-Learning for Combinatorial Optimization

Solving the QAP by Two-Stage Graph Pointer Networks and Reinforcement Learning

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Monte-Carlo Graph Search for AlphaZero

Mastering construction heuristics with self-play deep reinforcement learning

Game Solving with Online Fine-Tuning

A Novel Approach to Solving Goal-Achieving Problems for Board Games