Abstract:How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to the lack of appropriate mechanisms for the extraction of task information in the prediction, contrast, and reconstruction-related approaches and the limitations of bisimulation-related methods in domains with sparse rewards, it is still difficult for these methods to be effectively extended to environments with distractions. To alleviate these problems, in the paper, the action sequences, which contain task-intensive signals, are incorporated into representation learning. Specifically, we propose a Sequential Action--induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions, so the agent can be induced to learn the robust representation against distractions. We conduct extensive experiments on the DeepMind Control suite tasks with distractions while achieving the best performance over strong baselines. We also demonstrate the effectiveness of our method at disregarding task-irrelevant information by deploying SAR to real-world CARLA-based autonomous driving with natural distractions. Finally, we provide the analysis results of generalization drawn from the generalization decay and t-SNE visualization. Code and demo videos are available at <a class="link-external link-https" href="https://github.com/DMU-XMU/SAR.git" rel="external noopener nofollow">this https URL</a>.

Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

Policy-shaped prediction: avoiding distractions in model-based reinforcement learning

DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction

MuDreamer: Learning Predictive World Models without Reconstruction

Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction.

Focus-Then-Decide: Segmentation-Assisted Reinforcement Learning

Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning

Learning Latent Dynamic Robust Representations for World Models

Sample-efficient multi-agent reinforcement learning with masked reconstruction

DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

Sequential Action-Induced Invariant Representation for Reinforcement Learning

Task-Induced Representation Learning

DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

Generalizable Visual Reinforcement Learning with Segment Anything Model

Learning Task-relevant Representations for Generalization Via Characteristic Functions of Reward Sequence Distributions

Attention-Privileged Reinforcement Learning

Towards Disturbance-Free Visual Mobile Manipulation

Tackling Visual Control via Multi-View Exploration Maximization

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination