Abstract:Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS's convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy's actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes

SPO: Sequential Monte Carlo Policy Optimisation

Continuous Monte Carlo Graph Search

Enhancing Reinforcement Learning Through Guided Search

Non-Deterministic Policies in Markovian Decision Processes

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

Bayesian Optimized Monte Carlo Planning

Learning Non-Markovian Decision-Making from State-only Sequences

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search

Efficient Offline Policy Optimization with a Learned Model

Acting in Delayed Environments with Non-Stationary Markov Policies

Dual Monte Carlo Tree Search

Monte Carlo Tree Search with Boltzmann Exploration

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Multi-Agent Planning under Uncertainty with Monte Carlo Q-Value Function