Abstract:In realistic sparse reward tasks, existing theoretical methods cannot be effectively applied due to the low sampling probability ofrewarded episodes. Profound research on methods based on intrinsic rewards has been conducted to address this issue, but exploration with sparse rewards remains a great challenge. This paper describes the loop enhancement effect in exploration processes with sparse rewards. After each fully trained iteration, the execution probability of ineffective actions is higher than thatof other suboptimal actions, which violates biological habitual behavior principles and is not conducive to effective training. This paper proposes corresponding theorems of relieving the loop enhancement effect in the exploration process with sparse rewards and a heuristic exploration method based on action effectiveness constraints (AEC), which improves policy training efficiency by relieving the loop enhancement effect. Inspired by the fact that animals form habitual behaviors and goal-directed behaviors through the dorsolateral striatum and dorsomedial striatum. The function of the dorsolateral striatum is simulated by an action effectiveness evaluation mechanism (A2EM), which aims to reduce the rate of ineffective samples and improve episode reward expectations. The function of the dorsomedial striatum is simulated by an agent policy network, which aims to achieve task goals. The iterative training of A2EM and the policy forms the AEC model structure. A2EM provides effective samples for the agent policy; the agent policy provides training constraints for A2EM. The experimental results show that A2EM can relieve the loop enhancement effect and has good interpretability and generalizability. AEC enables agents to effectively reduce the loop rate in samples, can collect more effective samples, and improve the efficiency of policy training. The performance of AEC demonstrates the effectiveness of a biological heuristic approach that simulates the function of the dorsal striatum. This approach can be used to improve the robustness of agent exploration with sparse rewards.

Discovering and Exploiting Sparse Rewards in a Learned Behavior Space

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Sparse Reward Exploration via Novelty Search and Emitters

A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

Dealing with Sparse Rewards in Reinforcement Learning

Reward Space Noise for Exploration in Deep Reinforcement Learning

Efficient and Scalable Exploration Via Estimation-Error

Dynamic Subgoal-based Exploration via Bayesian Optimization

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Underexplored Subspace Mining for Sparse-Reward Cooperative Multi-Agent Reinforcement Learning

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Selective Learning for Sample-Efficient Training in Multi-Agent Sparse Reward Tasks

Reward-Free Exploration for Reinforcement Learning

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization

Knowledge is reward: Learning optimal exploration by predictive reward cashing

CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings

Hierarchical reinforcement learning for efficient exploration and transfer

Self-Supervised Online Reward Shaping in Sparse-Reward Environments