Goal Reduction with Loop-Removal Accelerates RL and Models Human Brain Activity in Goal-Directed Learning

Huzi Cheng,Joshua Brown
DOI: https://doi.org/10.1101/2024.03.19.585826
2024-10-13
Abstract:Goal-directed planning presents a challenge for classical RL algorithms due to the vastness of the combinatorial state and goal spaces, while humans and animals adapt to complex environments, especially with diverse, non-stationary objectives, often employing intermediate goals for long-horizon tasks. Here, we propose a goal reduction mechanism for effectively deriving subgoals from arbitrary and distant original goals, using a novel loop-removal technique. The product of the method, called goal-reducer, distills high-quality subgoals from a replay buffer, all without the need for prior global environmental knowledge. Simulations show that the goal-reducer can be integrated into RL frameworks like Deep Q-learning and Soft Actor-Critic. It accelerates performance in both discrete and continuous action space tasks, such as grid world navigation and robotic arm manipulation, relative to the corresponding standard RL models. Moreover, the goal-reducer, when combined with a local policy, without iterative training, outperforms its integrated deep RL counterparts in solving a navigation task. This goal reduction mechanism also models human problem-solving. Comparing the model's performance and activation with human behavior and fMRI data in a treasure hunting task, we found matching representational patterns between an goal-reducer agent's components and corresponding human brain areas, particularly the vmPFC and basal ganglia. The results suggest that humans may use a similar computational framework for goal-directed behaviors.
Neuroscience
What problem does this paper attempt to address?
The paper aims to address the challenges faced by classical Reinforcement Learning (RL) algorithms in goal-oriented tasks, particularly the efficiency issues when dealing with complex and variable goals. Specifically, the paper proposes a new goal reduction mechanism that effectively derives subgoals from arbitrary and distant original goals through a novel loop-removal technique. This method can extract high-quality subgoals from the replay buffer without prior knowledge of global environmental information. Moreover, this mechanism not only accelerates performance improvement in standard RL frameworks such as Deep Q-learning and Soft Actor-Critic but also solves navigation tasks by combining local policy without iterative training, outperforming integrated deep RL models in performance. The study also found that this goal reduction mechanism can simulate the human problem-solving process and matches the activity patterns of specific brain regions (such as the ventromedial prefrontal cortex vmPFC and basal ganglia), suggesting that humans may also use a similar computational framework for goal-directed behavior.