Abstract:Humans have the fascinating ability to achieve goals in a complex and constantly changing world, still surpassing modern machine-learning algorithms in terms of flexibility and learning speed. It is generally accepted that a crucial factor for this ability is the use of abstract, hierarchical representations, which employ structure in the environment to guide learning and decision making. Nevertheless, how we create and use these hierarchical representations is poorly understood. This study presents evidence that human behavior can be characterized as hierarchical reinforcement learning (RL). We designed an experiment to test specific predictions of hierarchical RL using a series of subtasks in the realm of context-based learning and observed several behavioral markers of hierarchical RL, such as asymmetric switch costs between changes in higher-level versus lower-level features, faster learning in higher-valued compared to lower-valued contexts, and preference for higher-valued compared to lower-valued contexts. We replicated these results across three independent samples. We simulated three models—a classic RL, a hierarchical RL, and a hierarchical Bayesian model—and compared their behavior to human results. While the flat RL model captured some aspects of participants’ sensitivity to outcome values, and the hierarchical Bayesian model captured some markers of transfer, only hierarchical RL accounted for all patterns observed in human behavior. This work shows that hierarchical RL, a biologically inspired and computationally simple algorithm, can capture human behavior in complex, hierarchical environments and opens the avenue for future research in this field.

Hierarchies of Reward Machines

Reward Machines for Deep RL in Noisy and Uncertain Environments

Maximally Permissive Reward Machines

Disentangled Planning and Control in Vision Based Robotics via Reward Machines

Learning Reward Machines in Cooperative Multi-Agent Tasks

Neural Reward Machines

Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov

Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure

Learning Representations in Model-Free Hierarchical Reinforcement Learning

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective

Learning Robust Reward Machines from Noisy Labels

Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering

Reward-Robust RLHF in LLMs

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

Hierarchical Average-Reward Linearly-solvable Markov Decision Processes

Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Prototypical Reward Network for Data-Efficient RLHF

Computational evidence for hierarchically structured reinforcement learning in humans

RRM: Robust Reward Model Training Mitigates Reward Hacking

Efficient Reinforcement Learning in Probabilistic Reward Machines