Goal Reduction with Loop-Removal Accelerates RL and Models Human Brain Activity in Goal-Directed Learning

Huzi Cheng,Joshua Brown

DOI: https://doi.org/10.1101/2024.03.19.585826

2024-10-13

Abstract:Goal-directed planning presents a challenge for classical RL algorithms due to the vastness of the combinatorial state and goal spaces, while humans and animals adapt to complex environments, especially with diverse, non-stationary objectives, often employing intermediate goals for long-horizon tasks. Here, we propose a goal reduction mechanism for effectively deriving subgoals from arbitrary and distant original goals, using a novel loop-removal technique. The product of the method, called goal-reducer, distills high-quality subgoals from a replay buffer, all without the need for prior global environmental knowledge. Simulations show that the goal-reducer can be integrated into RL frameworks like Deep Q-learning and Soft Actor-Critic. It accelerates performance in both discrete and continuous action space tasks, such as grid world navigation and robotic arm manipulation, relative to the corresponding standard RL models. Moreover, the goal-reducer, when combined with a local policy, without iterative training, outperforms its integrated deep RL counterparts in solving a navigation task. This goal reduction mechanism also models human problem-solving. Comparing the model's performance and activation with human behavior and fMRI data in a treasure hunting task, we found matching representational patterns between an goal-reducer agent's components and corresponding human brain areas, particularly the vmPFC and basal ganglia. The results suggest that humans may use a similar computational framework for goal-directed behaviors.

Neuroscience

What problem does this paper attempt to address?

The paper aims to address the challenges faced by classical Reinforcement Learning (RL) algorithms in goal-oriented tasks, particularly the efficiency issues when dealing with complex and variable goals. Specifically, the paper proposes a new goal reduction mechanism that effectively derives subgoals from arbitrary and distant original goals through a novel loop-removal technique. This method can extract high-quality subgoals from the replay buffer without prior knowledge of global environmental information. Moreover, this mechanism not only accelerates performance improvement in standard RL frameworks such as Deep Q-learning and Soft Actor-Critic but also solves navigation tasks by combining local policy without iterative training, outperforming integrated deep RL models in performance. The study also found that this goal reduction mechanism can simulate the human problem-solving process and matches the activity patterns of specific brain regions (such as the ventromedial prefrontal cortex vmPFC and basal ganglia), suggesting that humans may also use a similar computational framework for goal-directed behavior.

Goal Reduction with Loop-Removal Accelerates RL and Models Human Brain Activity in Goal-Directed Learning

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning

Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

Generating Attentive Goals for Prioritized Hindsight Reinforcement Learning

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space

State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

Quantile Regression Hindsight Experience Replay

Reconciling Spatial and Temporal Abstractions for Goal Representation

Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

A Goal-Conditioned Reinforcement Learning Algorithm with Environment Modeling

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Learning Subgoal Representations with Slow Dynamics

GRAC: Self-Guided and Self-Regularized Actor-Critic

Accelerated Robot Learning via Human Brain Signals

HG2P: Hippocampus-inspired High-reward Graph and Model-Free Q-Gradient Penalty for Path Planning and Motion Control

Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning

Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization

Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder