Abstract:Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision-making problems. However, DRL requires extensive interactions with image-based environments. Existing methods have combined self-supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human-level performance.

Simplified Temporal Consistency Reinforcement Learning

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Reinforcement learning under temporal logic constraints as a sequence modeling problem

Reinforcement learning under temporal logic constraints as a sequence modelling problem

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Episodic Reinforcement Learning with Expanded State-reward Space

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Deep Reinforcement Learning with Temporal Logics

Self-Attention Based Temporal Intrinsic Reward for Reinforcement Learning

Representation learning for continuous action spaces is beneficial for efficient policy learning

Sample Efficient Deep Reinforcement Learning with Online State Abstraction and Causal Transformer Model Prediction

Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

Temporal Shift Reinforcement Learning

Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning

Learning Parsimonious Dynamics for Generalization in Reinforcement Learning

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

DynaSTI: Dynamics Modeling with Sequential Temporal Information for Reinforcement Learning in Atari

Combining long and short spatiotemporal reasoning for deep reinforcement learning

Bootstrapped Representations in Reinforcement Learning