Abstract:Reinforcement Learning views the maximization of rewards and avoidance of punishments as central to explaining goal-directed behavior. However, over a life, organisms will need to learn about many different aspects of the world's structure: the states of the world and state-vector transition dynamics. The number of combinations of states grows exponentially as an agent incorporates new knowledge, and there is no obvious weighted combination of pre-existing rewards or costs defined for a given combination of states, as such a weighting would need to encode information about good and bad combinations prior to an agent's experience in the world. Therefore, we must develop more naturalistic accounts of behavior and motivation in large state-spaces. We show that it is possible to use only the intrinsic motivation metric of empowerment, which measures the agent's capacity to realize many possible futures under a transition operator. We propose to scale empowerment to hierarchical state-spaces by using Operator Bellman Equations. These equations produce state-time feasibility functions, which are compositional hierarchical state-time transition operators that map an initial state and time when an agent begins a policy to the final states and times of completing a goal. Because these functions are hierarchical operators we can define hierarchical empowerment measures on them. An agent can then optimize plans to distant states and times to maximize its hierarchical empowerment-gain, allowing it to discover goals that bring about a more favorable coupling of its internal structure (physiological states) to its external environment (world structure & spatial state). Life-long agents could therefore be primarily animated by principles of compositionality and empowerment, exhibiting self-concern for the growth & maintenance of their own structural integrity without recourse to reward-maximization.

Resource-rational reinforcement learning and sensorimotor causal states, and resource-rational maximiners

Reward is not Necessary: How to Create a Modular & Compositional Self-Preserving Agent for Life-Long Learning

Dynamic allocation of limited memory resources in reinforcement learning

A minimal model of cognition based on oscillatory and current-based reinforcement processes

Humans are resource-rational predictors in a sequence learning task

Modeling sensory-motor decisions in natural behavior

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

Modular inverse reinforcement learning for visuomotor behavior

A dopamine mechanism for reward maximization

Meta-Learning Strategies through Value Maximization in Neural Networks

Optimality-based reward learning with applications to toxicology

Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning

Model-Free Robust Optimal Feedback Mechanisms of Biological Motor Control

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

A Computational Theory of Learning Flexible Reward-Seeking Behavior with Place Cells

Maximal Algorithmic Caliber and Algorithmic Causal Network Inference: General Principles of Real-World General Intelligence?

A theory of cerebral learning regulated by the reward system. I. Hypotheses and mathematical description

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning.

Discovering Cognitive Strategies with Tiny Recurrent Neural Networks

Leveraging conscious and nonconscious learning for efficient AI