Abstract:In decision-making scenarios, \textit{reasoning} can be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome such as maximizing the value function of a Markov decision process (MDP). However, executing $P$ itself may bear some costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Such costs need to be taken into account in order to accurately model human behavior, as well as optimizing AI planning, as all physical systems are bound to face resource constraints. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to two-armed Bernoulli bandit (TABB) tasks, which have often been used to study human decision making. Owing to the meta problem's complexity, our solutions are necessarily approximate, but nevertheless robust within a range of assumptions that are arguably realistic for human decision-making scenarios. These results offer a normative framework for understanding human exploration under cognitive constraints. This integration of Bayesian adaptive strategies with metareasoning enriches both the theoretical landscape of decision-making research and practical applications in designing AI systems that plan under uncertainty and resource constraints.

A Monte Carlo Algorithm for Universally Optimal Bayesian Sequence Prediction and Planning

Universal Algorithmic Intelligence: A mathematical top->down approach

On Predictive Planning and Counterfactual Learning in Active Inference

A Bayesian Approach to Online Planning

Provably Bounded-Optimal Agents

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Optimal simulation-based Bayesian decisions

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning

Probabilistic design of optimal sequential decision-making algorithms in learning and control

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Cooperative Bayesian Optimization for Imperfect Agents

On efficient computation in active inference

Amortized Bayesian Decision Making for simulation-based models

A Bayesian Optimization through Sequential Monte Carlo and Statistical Physics-Inspired Techniques

Metareasoning in uncertain environments: a meta-BAMDP framework

Probabilistic programs for inferring the goals of autonomous agents

Nested Reasoning About Autonomous Agents Using Probabilistic Programs

Bayesian Design Principles for Frequentist Sequential Learning

Planning with Biological Neurons and Synapses

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models