Abstract:Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS's convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy's actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.

Learning Non-Markovian Decision-Making from State-only Sequences

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

State-only Imitation with Transition Dynamics Mismatch

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Non-Deterministic Policies in Markovian Decision Processes

Offline Imitation Learning with a Misspecified Simulator.

Learning a Decision Module by Imitating Driver's Control Behaviors

Decentralized policy learning with partial observation and mechanical constraints for multiperson modeling

Learning model-based planning from scratch

Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs

Acting in Delayed Environments with Non-Stationary Markov Policies

Inwardly rectifying K+ currents in intermediate cells in the cochlea of gerbils: a possible contribution to the endocochlear potential

Learning Markov State Abstractions for Deep Reinforcement Learning

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

Integrated model of cerebellal supervised learning and basal ganglia's reinforcement learning for mobile robot behavioral decision-making

Opportunistic Learning for Markov Decision Systems with Application to Smart Robots

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Deep Generative Models for Decision-Making and Control

Model-Based Reinforcement Learning Via Imagination with Derived Memory.

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs