Abstract:In our everyday lives, we continually need to commit to courses of future action even when direct feedback is not available. The expected reward of an action depends not only on a single decision but on a sequence of interdependent choices that will happen in the future. Admittedly, most decisions we make concern sequences of actions as opposed to single-step choices. To make these decisions, we rely on our ability to plan and forecast the potential outcomes of sequential decisions. However, how humans plan under novel real-world scenarios remains poorly understood. We developed a novel task to investigate how humans evaluate options and prioritize their decisions during planning in realistic situations. In each trial, a written planning scenario (for example, plan your birthday party) was followed by 9 pictures divided into three categories (e.g., 3 birthday cakes, 3 party locations, and 3 decorations). Participants were asked to choose one option from each category to create the best possible plan - the one with the highest subjective value - while both their response and gaze were tracked. With each option possibly having a different subjective value according to the other selected options, the required planning process resembles the navigation of an internal decision tree, whose complexity grows exponentially with the number of choices and future outcomes considered. Our results show that participants gather information at all levels of the decision trees, as suggested by their gaze-switching behavior. In addition, based on their assessments of importance and difficulty, we find that participants generally choose first what they report to be the most important and easiest category and then the least important and most difficult one. Overall, our task provides a novel means to study planning behavior in realistic, multi-alternative situations, as participants can freely navigate through all levels of the decision tree by subjectively evaluating potential scenarios through internal sampling and imagination.

A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

Time‐in‐action RL

Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

The Value Equivalence Principle for Model-Based Reinforcement Learning

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

A New View on Planning in Online Reinforcement Learning

On the role of planning in model-based deep reinforcement learning

The Value of Reward Lookahead in Reinforcement Learning

On shallow planning under partial observability

Model-advantage and value-aware models for model-based reinforcement learning: bridging the gap in theory and practice

Planning with Expectation Models for Control

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

A reinforcement learning diffusion decision model for value-based decisions

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Prioritization of decisions by importance and difficulty in human planning

Towards Solving Industrial Sequential Decision-making Tasks under Near-predictable Dynamics via Reinforcement Learning: an Implicit Corrective Value Estimation Approach

Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task

Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning

On Predictive Planning and Counterfactual Learning in Active Inference