Learning State Representations via Retracing in Reinforcement Learning

Changmin Yu,Dong Li,Jianye Hao,Jun Wang,Neil Burgess

DOI: https://doi.org/10.48550/arXiv.2111.12600

2022-09-24

Abstract:We propose learning via retracing, a novel self-supervised approach for learning the state representation (and the associated dynamics model) for reinforcement learning tasks. In addition to the predictive (reconstruction) supervision in the forward direction, we propose to include "retraced" transitions for representation / model learning, by enforcing the cycle-consistency constraint between the original and retraced states, hence improve upon the sample efficiency of learning. Moreover, learning via retracing explicitly propagates information about future transitions backward for inferring previous states, thus facilitates stronger representation learning for the downstream reinforcement learning tasks. We introduce Cycle-Consistency World Model (CCWM), a concrete model-based instantiation of learning via retracing. Additionally we propose a novel adaptive "truncation" mechanism for counteracting the negative impacts brought by "irreversible" transitions such that learning via retracing can be maximally effective. Through extensive empirical studies on visual-based continuous control benchmarks, we demonstrate that CCWM achieves state-of-the-art performance in terms of sample efficiency and asymptotic performance, whilst exhibiting behaviours that are indicative of stronger representation learning.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to learn state representation more effectively in reinforcement learning tasks. Specifically, the paper proposes a new method named "learning via retracing", which aims to enhance the learning of state representation by adding backward transitions (i.e., retracing from future states to past states). This method not only utilizes forward - prediction supervision, but also improves sample efficiency by enforcing a cycle - consistency constraint between the original state and the retracing state. Moreover, "learning via retracing" can explicitly propagate information about future transitions to infer previous states, thereby providing stronger state representation learning for downstream reinforcement learning tasks. The paper further proposes a specific model instance - Cycle - Consistency World Model (CCWM), and a novel adaptive truncation mechanism for identifying and excluding those irreversible transitions to maximize the effect of "learning via retracing". Through extensive empirical research on visual - based continuous - control benchmarks, the paper shows that CCWM reaches the state - of - the - art level in terms of sample efficiency and asymptotic performance, and exhibits stronger state representation learning ability.

Learning State Representations via Retracing in Reinforcement Learning

State Representation Learning for Effective Deep Reinforcement Learning.

Learning explainable task-relevant state representation for model-free deep reinforcement learning

Bootstrapped Representations in Reinforcement Learning

Bridging State and History Representations: Understanding Self-Predictive RL

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

State Representation Learning with Adjacent State Consistency Loss for Deep Reinforcement Learning.

Data-efficient Model-Based Reinforcement Learning with Trajectory Discrimination

Effective Representation Learning is More Effective in Reinforcement Learning Than You Think

Predictive Experience Replay for Continual Visual Control and Forecasting

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Towards Control-Centric Representations in Reinforcement Learning from Images

Learning a World Model With Multitimescale Memory Augmentation

State Chrono Representation for Enhancing Generalization in Reinforcement Learning

Enhancing Visual Reinforcement Learning with State-Action Representation

A Reliable Representation with Bidirectional Transition Model for Visual Reinforcement Learning Generalization

Learning consistent representations with temporal and causal enhancement for knowledge tracing

Simplified Temporal Consistency Reinforcement Learning

Learning Latent Dynamic Robust Representations for World Models

Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning