Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Christopher W. F. Parsonson,Alexandre Laterre,Thomas D. Barrett
DOI: https://doi.org/10.48550/arXiv.2205.14345
2022-12-05
Abstract:Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively use Reinforcement Learning (RL) to solve the variable selection (branching) problem in the Branch - and - Bound (B&B) method so as to improve the efficiency of solving Mixed - Integer Linear Programming (MILP). Specifically, the paper aims to overcome three major challenges of existing RL methods when applied to B&B: 1. **Long episodes**: Due to the depth and breadth of the B&B tree, the decision path is too long, resulting in sparse rewards, difficult credit assignment, and high return variance. 2. **Large state - action spaces**: Each branching step may have hundreds or thousands of potential branching candidates, and it is very difficult to explore effective trajectories. 3. **Partial observability**: When making branching decisions, the next state is determined by the node selection strategy, and these states are not within the control range of the agent, and the observation of the entire tree is incomplete, which makes it difficult to predict future states. To solve these problems, the authors propose the "retro branching" method. By retrospectively decomposing the search tree into paths within multiple sub - trees, the agent can learn from shorter trajectories, thereby reducing the return variance and improving the predictability of future states. This method not only directly addresses the above challenges but also allows the use of more complex node selection strategies to handle larger and more complex MILP instances. ### Specific objectives - **Improve learning efficiency**: By retrospectively constructing trajectories, RL can learn effective branching strategies in a shorter time. - **Reduce dependence on expert data**: Unlike Imitation Learning (IL), RL does not require expensive expert - labeled data and can discover new strategies. - **Expand to large - scale problems**: By improving observability and reducing exploration difficulty, RL can be applied to larger MILP instances. ### Main contributions - Propose the retro branching method to simplify the learning process by retrospectively constructing trajectories. - Conduct experiments on four combinatorial optimization tasks, and the results show that this method improves the performance by 3 - 5 times compared with the current state - of - the - art RL branching algorithms and is close to the performance of the best IL method. - Prove that the retrospectively constructed trajectories are crucial for achieving these results. In conclusion, the goal of this paper is to enable RL to learn variable selection strategies more effectively in the Branch - and - Bound method by introducing the retro branching method, thereby improving the efficiency of solving MILP.