Abstract:Modular Reinforcement Learning decomposes a monolithic task into several tasks with sub-goals and learns each one in parallel to solve the original problem. Such learning patterns can be traced in the brains of animals. Recent evidence in neuroscience shows that animals utilize separate systems for processing rewards and punishments, illuminating a different perspective for modularizing Reinforcement Learning tasks. MaxPain and its deep variant, Deep MaxPain, showed the advances of such dichotomy-based decomposing architecture over conventional Q-learning in terms of safety and learning efficiency. These two methods differ in policy derivation. MaxPain linearly unified the reward and punishment value functions and generated a joint policy based on unified values; Deep MaxPain tackled scaling problems in high-dimensional cases by linearly forming a joint policy from two sub-policies obtained from their value functions. However, the mixing weights in both methods were determined manually, causing inadequate use of the learned modules. In this work, we discuss the signal scaling of reward and punishment related to discounting factor γ, and propose a weak constraint for signaling design. To further exploit the learning models, we propose a state-value dependent weighting scheme that automatically tunes the mixing weights: hard-max and softmax based on a case analysis of Boltzmann distribution. We focus on maze-solving navigation tasks and investigate how two metrics (pain-avoiding and goal-reaching) influence each other's behaviors during learning. We propose a sensor fusion network structure that utilizes lidar and images captured by a monocular camera instead of lidar-only and image-only sensing. Our results, both in the simulation of three types of mazes with different complexities and a real robot experiment of an L-maze on Turtlebot3 Waffle Pi, showed the improvements of our methods.

A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem

Learning Hierarchical Behavior and Motion Planning for Autonomous Driving.

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Solving the Quadratic Assignment Problem using Deep Reinforcement Learning

A Comparison of Reward Functions in Q-Learning Applied to a Cart Position Problem

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems.

Modular deep reinforcement learning from reward and punishment for robot navigation

Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem

A Deep Reinforcement Learning Approach towards Pendulum Swing-up Problem based on TF-Agents

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Modelling resource allocation in uncertain system environment through deep reinforcement learning

Emergence of Human-comparable Balancing Behaviors by Deep Reinforcement Learning

Koopman Operator-Based Knowledge-Guided Reinforcement Learning for Safe Human-Robot Interaction

Learning Ball-balancing Robot Through Deep Reinforcement Learning

Towards Learning Foundation Models for Heuristic Functions to Solve Pathfinding Problems

Learning Style Integrated Deep Reinforcement Learning Framework for Programming Problem Recommendation in Online Judge System

State of the Art Control of Atari Games Using Shallow Reinforcement Learning

Real-World Dexterous Object Manipulation based Deep Reinforcement Learning

Deep Reinforcement Learning-Based Rehabilitation Robot Trajectory Planning with Optimized Reward Functions

CPG-Based Hierarchical Locomotion Control for Modular Quadrupedal Robots Using Deep Reinforcement Learning.

Hybrid LMC: Hybrid Learning and Model-based Control for Wheeled Humanoid Robot via Ensemble Deep Reinforcement Learning