Abstract:We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer program. MCFS algorithms can be seen as extensions of Monte Carlo Tree Search (MCTS) to cases where, rather than finding a good path (solution) within a tree, the problem is to find a small tree within a forest of candidate trees. We instantiate and evaluate our ideas in an algorithm that we dub Knuth Synthesis, an MCFS algorithm that learns DPLL branching policies for solving the Boolean satisfiability (SAT) problem, with the objective of achieving good average-case performance on a given distribution of unsatisfiable problem instances. Knuth Synthesis is the first RL approach to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree, leveraging two key ideas: first, we estimate tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975); second, we query a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus our policy search on early decisions that offer the greatest potential for reducing tree size. We matched or exceeded the performance of a strong baseline on three well-known SAT distributions, facing problems that were two orders of magnitude more challenging than those addressed in previous RL studies.

Policies Grow on Trees: Model Checking Families of MDPs

1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search

Strong Simple Policies for POMDPs

Optimal Decision Tree Policies for Markov Decision Processes

Learning Robust Policies for Uncertain Parametric Markov Decision Processes

Policy Graph Pruning And Optimization In Monte Carlo Value Iteration For Continuous-State Pomdps

Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming

Policy Synthesis for Factored MDPs with Graph Temporal Logic Specifications

Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance Properties

Search and Explore: Symbiotic Policy Synthesis in POMDPs

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

What Are the Odds? Improving the foundations of Statistical Model Checking

UNSAT Solver Synthesis via Monte Carlo Forest Search

Robust Almost-Sure Reachability in Multi-Environment MDPs

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Models and algorithms for skip-free Markov decision processes on trees

Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Strategy Synthesis in POMDPs via Game-Based Abstractions