Learning State Representations via Retracing in Reinforcement Learning

Changmin Yu,Dong Li,Jianye Hao,Jun Wang,Neil Burgess
DOI: https://doi.org/10.48550/arXiv.2111.12600
2022-09-24
Abstract:We propose learning via retracing, a novel self-supervised approach for learning the state representation (and the associated dynamics model) for reinforcement learning tasks. In addition to the predictive (reconstruction) supervision in the forward direction, we propose to include "retraced" transitions for representation / model learning, by enforcing the cycle-consistency constraint between the original and retraced states, hence improve upon the sample efficiency of learning. Moreover, learning via retracing explicitly propagates information about future transitions backward for inferring previous states, thus facilitates stronger representation learning for the downstream reinforcement learning tasks. We introduce Cycle-Consistency World Model (CCWM), a concrete model-based instantiation of learning via retracing. Additionally we propose a novel adaptive "truncation" mechanism for counteracting the negative impacts brought by "irreversible" transitions such that learning via retracing can be maximally effective. Through extensive empirical studies on visual-based continuous control benchmarks, we demonstrate that CCWM achieves state-of-the-art performance in terms of sample efficiency and asymptotic performance, whilst exhibiting behaviours that are indicative of stronger representation learning.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to learn state representation more effectively in reinforcement learning tasks. Specifically, the paper proposes a new method named "learning via retracing", which aims to enhance the learning of state representation by adding backward transitions (i.e., retracing from future states to past states). This method not only utilizes forward - prediction supervision, but also improves sample efficiency by enforcing a cycle - consistency constraint between the original state and the retracing state. Moreover, "learning via retracing" can explicitly propagate information about future transitions to infer previous states, thereby providing stronger state representation learning for downstream reinforcement learning tasks. The paper further proposes a specific model instance - Cycle - Consistency World Model (CCWM), and a novel adaptive truncation mechanism for identifying and excluding those irreversible transitions to maximize the effect of "learning via retracing". Through extensive empirical research on visual - based continuous - control benchmarks, the paper shows that CCWM reaches the state - of - the - art level in terms of sample efficiency and asymptotic performance, and exhibits stronger state representation learning ability.