Abstract:A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the reinforcement learning (RL) agent's behavior as the environment changes over its lifetime while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, that is, dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process (CRP) prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of stability, overall performance, and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.

A Goal-Conditioned Reinforcement Learning Algorithm with Environment Modeling

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

Predictive Hierarchical Reinforcement Learning for Path-Efficient Mapless Navigation with Moving Target.

Adaptive Environment Modeling Based Reinforcement Learning for Collision Avoidance in Complex Scenes

Autonomous navigation of mobile robots in unknown environments using off-policy reinforcement learning with curriculum learning

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning

Learning World Models for Unconstrained Goal Navigation

Autonomous Learning and Navigation of Mobile Robots Based on Deep Reinforcement Learning

Self-imitation guided goal-conditioned reinforcement learning

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

Visual Navigation with Multiple Goals Based on Deep Reinforcement Learning

Improving Generalization in Reinforcement Learning Training Regimes for Social Robot Navigation

TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Deep Reinforcement Learning with Heuristic Corrections for UGV Navigation

Dynamics-Adaptive Continual Reinforcement Learning Via Progressive Contextualization.

Robot Subgoal-guided Navigation in Dynamic Crowded Environments with Hierarchical Deep Reinforcement Learning