Abstract:The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually problem-tailored, which lack generality and scalability. To address these issues, a heuristic algorithm based on deep reinforcement learning (DRLH) is proposed in this paper. It features in inputting specific features and using a neural network model called NN to guild the selection of variable at each solution construction step. Also, to improve the algorithm speed and efficiency, two algorithm variants named simplified DRLH (DRLS) and DRLS with hill climbing (DRLS-HC) are developed as well. These three algorithms are examined through extensive experiments in comparison with famous heuristic algorithms from the literature. Experimental results show that the DRLH, DRLS, and DRLS-HC outperform their competitors in terms of both solution quality and computational efficiency. Precisely, the DRLH achieves the best-quality results, while DRLS offers a high-quality solution in a very short time. By adding a hill-climbing procedure to DRLS, the resulting DRLS-HC algorithm is able to obtain almost the same quality result as DRLH with however 5 times less computing time on average. We conducted additional experiments on large-scale instances and various data distributions to verify the generality and scalability of the proposed algorithms, and the results on benchmark instances indicate the ability of the algorithms to be applied to practical problems.

An Approximate Quadratic Programming for Efficient Bellman Equation Solution

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

Robust Quadratic Programming for MDPs with uncertain observation noise.

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation.

A multilevel algorithm for large unconstrained binary quadratic optimization

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Parameterized Projected Bellman Operator

Neural Network for Solving Convex Quadratic Bilevel Programming Problems

A Simple And High Performance Neural Network For Quadratic Programming Problems

Stable Training of Bellman Error in Reinforcement Learning

Provably Efficient Q-learning with Function Approximation Via Distribution Shift Error Checking Oracle

Linfa-Q: Accurate Q-Learning with Linear Function Approximation

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Genetic Algorithm for Solving Quadratic Bilevel Programming Problem

An Efficient Unsupervised Framework for Convex Quadratic Programs via Deep Unrolling

An Accelerated Proximal Gradient-Based Algorithm for Quadratic Programming

Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation

Toward General Function Approximation in Nonstationary Reinforcement Learning

Quadratic Approximation Greedy Pursuit for Cardinality-Constrained Sparse Learning