Abstract:Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS's convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy's actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.

A Policy-Based Monte Carlo Tree Search Method for Container Pre-Marshalling

Deep Learning Assisted Heuristic Tree Search for the Container Pre-marshalling Problem

A Decision-Tree Stacking Heuristic Minimising the Expected Number of Reshuffles at A Container Terminal

An optimization model for the container pre-marshalling problem

An Iterative Three-Stage Algorithm for the Pre-Marshalling Problem in Container Terminals.

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

C-MCTS: Safe Planning with Monte Carlo Tree Search

Integrated scheduling optimization of U-shaped automated container terminal under loading and unloading mode

Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

Fittest Survival: an Enhancement Mechanism for Monte Carlo Tree Search.

An Optimal Computing Budget Allocation Tree Policy for Monte Carlo Tree Search

Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes

Multi-Stage Monte Carlo Tree Search for Non-Monotone Object Rearrangement Planning in Narrow Confined Environments

Monte Carlo Tree Search: a review of recent modifications and applications

Generalized Mean Estimation in Monte-Carlo Tree Search

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Machine Learning-Driven Algorithms for the Container Relocation Problem

Solving the unit-load pre-marshalling problem in block stacking storage systems with multiple access directions

Joint Optimization of Pre-Marshalling and Yard Cranes Deployment in the Export Block

An Analysis on the Effects of Evolving the Monte Carlo Tree Search Upper Confidence for Trees Selection Policy on Unimodal, Multimodal and Deceptive Landscapes