Abstract:Reinforcement Learning views the maximization of rewards and avoidance of punishments as central to explaining goal-directed behavior. However, over a life, organisms will need to learn about many different aspects of the world's structure: the states of the world and state-vector transition dynamics. The number of combinations of states grows exponentially as an agent incorporates new knowledge, and there is no obvious weighted combination of pre-existing rewards or costs defined for a given combination of states, as such a weighting would need to encode information about good and bad combinations prior to an agent's experience in the world. Therefore, we must develop more naturalistic accounts of behavior and motivation in large state-spaces. We show that it is possible to use only the intrinsic motivation metric of empowerment, which measures the agent's capacity to realize many possible futures under a transition operator. We propose to scale empowerment to hierarchical state-spaces by using Operator Bellman Equations. These equations produce state-time feasibility functions, which are compositional hierarchical state-time transition operators that map an initial state and time when an agent begins a policy to the final states and times of completing a goal. Because these functions are hierarchical operators we can define hierarchical empowerment measures on them. An agent can then optimize plans to distant states and times to maximize its hierarchical empowerment-gain, allowing it to discover goals that bring about a more favorable coupling of its internal structure (physiological states) to its external environment (world structure & spatial state). Life-long agents could therefore be primarily animated by principles of compositionality and empowerment, exhibiting self-concern for the growth & maintenance of their own structural integrity without recourse to reward-maximization.

Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle

Intrinsic Motivation Exploration Via Self-Supervised Prediction in Reinforcement Learning

A unified strategy for implementing curiosity and empowerment driven reinforcement learning

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Empowerment-driven Exploration using Mutual Information Estimation

Random curiosity-driven exploration in deep reinforcement learning

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration

CIExplore: Curiosity and Influence-based Exploration in Multi-Agent Cooperative Scenarios with Sparse Rewards

An Evaluation Study of Intrinsic Motivation Techniques applied to Reinforcement Learning over Hard Exploration Environments

Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Show me the Way: Intrinsic Motivation from Demonstrations

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Focus on Impact: Indoor Exploration with Intrinsic Motivation

Reward is not Necessary: How to Create a Modular & Compositional Self-Preserving Agent for Life-Long Learning

Successor-Predecessor Intrinsic Exploration

Curiosity-driven Exploration by Self-supervised Prediction

Empowerment contributes to exploration behaviour in a creative video game