Abstract:In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.

Unified Curiosity-Driven Learning with Smoothed Intrinsic Reward Estimation.

Attention-based Curiosity-driven Exploration in Deep Reinforcement Learning

Random curiosity-driven exploration in deep reinforcement learning

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

A unified strategy for implementing curiosity and empowerment driven reinforcement learning

ACDER: Augmented Curiosity-Driven Experience Replay

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Curiosity-driven Exploration by Self-supervised Prediction

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration

Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Never Give Up: Learning Directed Exploration Strategies

Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning

Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration

Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration

Curiosity-Driven Exploration via Latent Bayesian Surprise