Abstract:Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively use Reinforcement Learning (RL) to solve the variable selection (branching) problem in the Branch - and - Bound (B&B) method so as to improve the efficiency of solving Mixed - Integer Linear Programming (MILP). Specifically, the paper aims to overcome three major challenges of existing RL methods when applied to B&B: 1. **Long episodes**: Due to the depth and breadth of the B&B tree, the decision path is too long, resulting in sparse rewards, difficult credit assignment, and high return variance. 2. **Large state - action spaces**: Each branching step may have hundreds or thousands of potential branching candidates, and it is very difficult to explore effective trajectories. 3. **Partial observability**: When making branching decisions, the next state is determined by the node selection strategy, and these states are not within the control range of the agent, and the observation of the entire tree is incomplete, which makes it difficult to predict future states. To solve these problems, the authors propose the "retro branching" method. By retrospectively decomposing the search tree into paths within multiple sub - trees, the agent can learn from shorter trajectories, thereby reducing the return variance and improving the predictability of future states. This method not only directly addresses the above challenges but also allows the use of more complex node selection strategies to handle larger and more complex MILP instances. ### Specific objectives - **Improve learning efficiency**: By retrospectively constructing trajectories, RL can learn effective branching strategies in a shorter time. - **Reduce dependence on expert data**: Unlike Imitation Learning (IL), RL does not require expensive expert - labeled data and can discover new strategies. - **Expand to large - scale problems**: By improving observability and reducing exploration difficulty, RL can be applied to larger MILP instances. ### Main contributions - Propose the retro branching method to simplify the learning process by retrospectively constructing trajectories. - Conduct experiments on four combinatorial optimization tasks, and the results show that this method improves the performance by 3 - 5 times compared with the current state - of - the - art RL branching algorithms and is close to the performance of the best IL method. - Prove that the retrospectively constructed trajectories are crucial for achieving these results. In conclusion, the goal of this paper is to enable RL to learn variable selection strategies more effectively in the Branch - and - Bound method by introducing the retro branching method, thereby improving the efficiency of solving MILP.

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Reinforcement Learning for Variable Selection in a Branch and Bound Algorithm

Branching Reinforcement Learning

Reinforcement Learning for Node Selection in Mixed Integer Programming

Learning to Branch in Mixed Integer Programming

Towards Imitation Learning to Branch for MIP: A Hybrid Reinforcement Learning Based Sample Augmentation Approach

Reinforcement Learning for Node Selection in Branch-and-Bound

Reinforcement Learning Driven Heuristic Optimization

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Tea flavonols in cardiovascular disease and cancer epidemiology.

Bridging RL Theory and Practice with the Effective Horizon

A novel reinforcement learning-based method for structure optimization

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Yordle: An Efficient Imitation Learning for Branch and Bound

Bridging the gap between Markowitz planning and deep reinforcement learning

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

Combinatorial Optimization with Policy Adaptation using Latent Space Search

Beyond Trial and Error: Lane Keeping with Monte Carlo Tree Search-Driven Optimization of Reinforcement Learning

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Learning to Optimize for Reinforcement Learning

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization