Abstract:Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS's convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy's actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.

Online model adaptation in Monte Carlo tree search planning

Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Bayesian Optimized Monte Carlo Planning

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Planning spatial networks with Monte Carlo tree search

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

A Bayesian Approach to Online Planning

Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes

Efficient Multi-agent Reinforcement Learning by Planning

Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Scalable Online Planning via Reinforcement Learning Fine-Tuning

A Safe Exploration Strategy for Model-free Task Adaptation in Safety-constrained Grid Environments

Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Beyond Trial and Error: Lane Keeping with Monte Carlo Tree Search-Driven Optimization of Reinforcement Learning

Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search