Abstract:Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

Deep Reinforcement Learning with Credit Assignment for Combinatorial Optimization

Model-based Credit Assignment for Model-free Deep Reinforcement Learning

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Learning to assign credit in reinforcement learning by incorporating abstract relations

Towards Practical Credit Assignment for Deep Reinforcement Learning

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

On the Difficulty of Generalizing Reinforcement Learning Framework for Combinatorial Optimization

A Survey of Temporal Credit Assignment in Deep Reinforcement Learning

Deep reinforcement learning for multi-objective combinatorial optimization: A case study on multi-objective traveling salesman problem

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems

Constrained Combinatorial Optimization with Reinforcement Learning

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems

Credit assignment in heterogeneous multi-agent reinforcement learning for fully cooperative tasks

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

A Benchmark Study of Deep-RL Methods for Maximum Coverage Problems over Graphs

On Credit Assignment in Hierarchical Reinforcement Learning

Multi-Objective Combinatorial Optimization Algorithm Based on Asynchronous Advantage Actor–Critic and Graph Transformer Networks

Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems