Abstract:In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.

Parameter Decision Making in Adaptive Markov Decision Process with Finite Planning Horizon

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

Risk probability optimization of finite horizon piecewise deterministic Markov decision processes

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Decision-Theoretic Planning with non-Markovian Rewards

A Potential-Based Method for Finite-Stage Markov Decision Process

Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process

An Adaptive Learning Parameters Algorithm in Three-Way Decision-Theoretic Rough Set Model

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Attention-Based Planning with Active Perception

Adaptive dynamic programming-based hierarchical decision-making of non-affine systems

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Metareasoning in uncertain environments: a meta-BAMDP framework

Parameterized Markov Decision Process and Its Application to Service Rate Control.

Optimistic Planning by Regularized Dynamic Programming

An Iterative Decision-Making Scheme for Markov Decision Processes and Its Application to Self-adaptive Systems.

Hybrid Heuristic Online Planning for POMDPs

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Hybrid Planning for Dynamic Multimodal Stochastic Shortest Paths

A safe exploration approach to constrained Markov decision processes