For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Scott Fujimoto,Wei-Di Chang,Edward J. Smith,Shixiang Shane Gu,Doina Precup,David Meger
2023-11-06
Abstract:In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of low sample efficiency in Reinforcement Learning (RL). Specifically, it points out that most offline policy RL algorithms provide weak learning signals due to the use of the Bellman equation, leading to inefficient learning. To solve this problem, the paper proposes a new method called SALE (State-Action Representation Learning) to learn embedded representations from low-level states, thereby improving the performance of RL algorithms. ### Main Contributions 1. **SALE Method**: By modeling environment dynamics in the latent space to learn state-action joint embeddings, it enables effective representation learning even for low-level states. 2. **Design Study**: Conducted extensive empirical research on the design space of embedded representations, identifying key design choices that affect final performance. 3. **TD7 Algorithm**: Combined the SALE method with the TD3 algorithm and introduced checkpointing, prioritized experience replay, and behavior cloning terms to form the TD7 algorithm. TD7 significantly outperformed existing continuous control algorithms on OpenAI Gym benchmark tasks, improving performance by 276.7% and 50.7% over TD3 at 300k and 5M timesteps, respectively. ### Key Techniques 1. **State-Action Joint Embedding**: Using two encoders \( f \) and \( g \), state \( s \) is encoded as state embedding \( z_s \), and state-action pair \( (s, a) \) is encoded as state-action embedding \( z_{sa} \). 2. **Normalized Embedding**: Introduced an AvgL1Norm layer to maintain the relative scale of embeddings throughout the learning process. 3. **Fixed Embedding**: To prevent instability caused by inconsistent inputs, embeddings used to train the current value function and policy network are frozen. 4. **Clipped Values**: Reduced extrapolation error by limiting the range of target values, especially when increasing the state-action input dimension in online RL. ### Experimental Results - **Performance Improvement**: TD7 significantly outperformed existing methods on multiple benchmark tasks without adding extra complexity, such as large-scale ensembles, additional updates per timestep, or hyperparameter tuning for each environment. - **Stability**: By using checkpointing, TD7 can use the best-performing policy during training for testing, thereby improving performance stability. ### Conclusion The paper effectively addresses the issue of low sample efficiency in reinforcement learning by proposing the SALE method and the TD7 algorithm, making significant progress in representation learning for low-level states. These methods not only improve performance but also exhibit excellent stability and generalization capabilities.