Abstract:Modular Reinforcement Learning decomposes a monolithic task into several tasks with sub-goals and learns each one in parallel to solve the original problem. Such learning patterns can be traced in the brains of animals. Recent evidence in neuroscience shows that animals utilize separate systems for processing rewards and punishments, illuminating a different perspective for modularizing Reinforcement Learning tasks. MaxPain and its deep variant, Deep MaxPain, showed the advances of such dichotomy-based decomposing architecture over conventional Q-learning in terms of safety and learning efficiency. These two methods differ in policy derivation. MaxPain linearly unified the reward and punishment value functions and generated a joint policy based on unified values; Deep MaxPain tackled scaling problems in high-dimensional cases by linearly forming a joint policy from two sub-policies obtained from their value functions. However, the mixing weights in both methods were determined manually, causing inadequate use of the learned modules. In this work, we discuss the signal scaling of reward and punishment related to discounting factor γ, and propose a weak constraint for signaling design. To further exploit the learning models, we propose a state-value dependent weighting scheme that automatically tunes the mixing weights: hard-max and softmax based on a case analysis of Boltzmann distribution. We focus on maze-solving navigation tasks and investigate how two metrics (pain-avoiding and goal-reaching) influence each other's behaviors during learning. We propose a sensor fusion network structure that utilizes lidar and images captured by a monocular camera instead of lidar-only and image-only sensing. Our results, both in the simulation of three types of mazes with different complexities and a real robot experiment of an L-maze on Turtlebot3 Waffle Pi, showed the improvements of our methods.

Latent Exploration for Reinforcement Learning

Random Latent Exploration for Deep Reinforcement Learning

A Temporally Correlated Latent Exploration for Reinforcement Learning

LASER: Learning a Latent Action Space for Efficient Reinforcement Learning

Model-Based Reinforcement Learning via Latent-Space Collocation

Deep Exploration with PAC-Bayes

Action space noise optimization as exploration in deterministic policy gradient for locomotion tasks

Learning Latent Plans from Play

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments from Temporal Logic Specifications

Modular deep reinforcement learning from reward and punishment for robot navigation

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Latent Context Based Soft Actor-Critic

Efficient Planning with Latent Diffusion

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

LIRL: Latent Imagination-Based Reinforcement Learning for Efficient Coverage Path Planning

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning

LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning.

Directed Exploration in Reinforcement Learning from Linear Temporal Logic

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Exciting Action: Investigating Efficient Exploration for Learning Musculoskeletal Humanoid Locomotion