Abstract:Modular Reinforcement Learning decomposes a monolithic task into several tasks with sub-goals and learns each one in parallel to solve the original problem. Such learning patterns can be traced in the brains of animals. Recent evidence in neuroscience shows that animals utilize separate systems for processing rewards and punishments, illuminating a different perspective for modularizing Reinforcement Learning tasks. MaxPain and its deep variant, Deep MaxPain, showed the advances of such dichotomy-based decomposing architecture over conventional Q-learning in terms of safety and learning efficiency. These two methods differ in policy derivation. MaxPain linearly unified the reward and punishment value functions and generated a joint policy based on unified values; Deep MaxPain tackled scaling problems in high-dimensional cases by linearly forming a joint policy from two sub-policies obtained from their value functions. However, the mixing weights in both methods were determined manually, causing inadequate use of the learned modules. In this work, we discuss the signal scaling of reward and punishment related to discounting factor γ, and propose a weak constraint for signaling design. To further exploit the learning models, we propose a state-value dependent weighting scheme that automatically tunes the mixing weights: hard-max and softmax based on a case analysis of Boltzmann distribution. We focus on maze-solving navigation tasks and investigate how two metrics (pain-avoiding and goal-reaching) influence each other's behaviors during learning. We propose a sensor fusion network structure that utilizes lidar and images captured by a monocular camera instead of lidar-only and image-only sensing. Our results, both in the simulation of three types of mazes with different complexities and a real robot experiment of an L-maze on Turtlebot3 Waffle Pi, showed the improvements of our methods.

Success-Rate Targeted Reinforcement Learning by Disorientation Penalty

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Inverse Reinforcement Learning with Multiple Ranked Experts

Self Punishment and Reward Backfill for Deep Q-Learning

Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards

Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

An Incremental Optimization Approach to Address the Spatiotemporal Reward Coupling Effects in Deep Reinforcement Learning for Path Planning

Modular deep reinforcement learning from reward and punishment for robot navigation

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Towards Safe Reinforcement Learning Via Constraining Conditional Value-at-Risk

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Orientation-Preserving Rewards’ Balancing in Reinforcement Learning

Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior

Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points

Behavior Alignment via Reward Function Optimization

Survival-Oriented Reinforcement Learning Model: an Effcient and Robust Deep Reinforcement Learning Algorithm for Autonomous Driving Problem.