Abstract:Deep reinforcement learning (RL) has recently shown significant benefits in solving combinatorial optimization (CO) problems, reducing reliance on domain expertise, and improving computational efficiency. However, the field lacks a unified benchmark for easy development and standardized comparison of algorithms across diverse CO problems. To fill this gap, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 23 state-of-the-art methods and more than 20 CO problems. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configuration of diverse RL algorithms, neural network architectures, inference techniques, and environments. RL4CO allows researchers to seamlessly navigate existing successes and develop their unique designs, facilitating the entire research process by decoupling science from heavy engineering. We also provide extensive benchmark studies to inspire new insights and future work. RL4CO has attracted numerous researchers in the community and is open-sourced at <a class="link-external link-https" href="https://github.com/ai4co/rl4co" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of the lack of a unified benchmarking tool in the field of Combinatorial Optimization (CO). Specifically: 1. **Lack of standardized benchmarks**: - Although Reinforcement Learning (RL) has made significant progress in solving combinatorial optimization problems, there is currently a lack of a unified benchmarking library, making it difficult to compare different algorithms. - The lack of standardized benchmarks hinders researchers from analyzing past work under consistent implementations and conditions, and it is difficult to determine whether one method is superior to another. 2. **Improve research efficiency**: - Existing research methods often require a large amount of engineering work, which limits the participation of new researchers and the further development of existing achievements. - By providing a unified benchmarking library, the development process can be simplified, training and testing efficiency can be improved, and fair and comprehensive performance evaluation can be promoted. 3. **Promote cross - problem generality**: - An important goal of combinatorial optimization is to be able to generalize to multiple problems without a large amount of problem - specific knowledge. - In current research, differences in different implementations make it difficult for new researchers to participate in the NCO (Neural Combinatorial Optimization) community, and inconsistent comparisons also hinder direct performance evaluation. ### Solutions To fill this gap, the paper introduces **RL4CO**, which is a unified and extensive benchmarking library with the following characteristics: 1. **Modularity and flexibility**: - It provides modular implementations of 27 environments and 23 baseline models, allowing for flexible and automated combinations, facilitating testing, switching, and achieving state - of - the - art performance. - Through a customized unified pipeline, based on advanced libraries such as TorchRL, PyTorch Lightning, Hydra, and TensorDict, the training and testing efficiency is improved. 2. **Standardized evaluation**: - Standardized evaluation ensures fair and comprehensive comparisons, enabling researchers to automatically test a wider range of problems from different distributions and collect valuable insights using the testbed. 3. **Open - source code**: - RL4CO is an open - source project, which has attracted the participation of many researchers, and the code can be found on GitHub. ### Main contributions 1. **Simplify development**: - Through modular implementation, the development process is simplified, allowing researchers to easily test and switch different combinations of environments and models. 2. **Improve efficiency**: - Through a customized unified pipeline, the efficiency of training and testing is improved, reducing repetitive re - engineering work. 3. **Standardized evaluation**: - It provides a standardized evaluation method, ensuring fair and comprehensive performance comparisons and promoting consistent progress in research. ### Conclusion The introduction of RL4CO not only solves the problem of the lack of standardized benchmarks in current NCO research, but also greatly simplifies the research process through modularity, flexibility, and standardized evaluation, promoting innovation and progress in this field.

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark

Deep Reinforcement Learning with Credit Assignment for Combinatorial Optimization

A Benchmark Study of Deep-RL Methods for Maximum Coverage Problems over Graphs

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights.

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Constrained Combinatorial Optimization with Reinforcement Learning

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

On the Difficulty of Generalizing Reinforcement Learning Framework for Combinatorial Optimization

DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization

ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs

Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems

Learning to Solve Combinatorial Optimization under Positive Linear Constraints via Non-Autoregressive Neural Networks

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

RRLS : Robust Reinforcement Learning Suite

Reinforcement Learning Driven Heuristic Optimization

ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization