Abstract:Reinforcement learning often requires extensive training data. Simulation-to-real transfer offers a promising approach to address this challenge in robotics. While differentiable simulators offer improved sample efficiency through exact gradients, they can be unstable in contact-rich environments and may lead to poor generalization. This paper introduces a novel approach integrating sharpness-aware optimization into gradient-based reinforcement learning algorithms. Our simulation results demonstrate that our method, tested on contact-rich environments, significantly enhances policy robustness to environmental variations and action perturbations while maintaining the sample efficiency of first-order methods. Specifically, our approach improves action noise tolerance compared to standard first-order methods and achieves generalization comparable to zeroth-order methods. This improvement stems from finding flatter minima in the loss landscape, associated with better generalization. Our work offers a promising solution to balance efficient learning and robust sim-to-real transfer in robotics, potentially bridging the gap between simulation and real-world performance.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the generalization ability of robot motion strategies, especially their performance when transferred from the simulation environment to the real world. Specifically, the paper focuses on how to enhance the robustness and generalization ability of gradient - based reinforcement learning algorithms in contact - rich environments by introducing sharpness - aware optimization. ### Main Problems and Challenges 1. **Balance between sample efficiency and generalization ability**: - Reinforcement Learning (RL) usually requires a large amount of training data, which may be infeasible in online learning. - Simulation - to - real transfer provides a promising method to address this challenge, but the gap between the simulation and the real world remains a significant problem. - Differentiable simulators, although improving sample efficiency, may be unstable in contact - rich environments and may lead to poor generalization ability. 2. **Handling of non - smooth or discrete action spaces**: - Gradient - based methods usually provide more efficient parameter updates, but may perform poorly in non - smooth or discrete action spaces. - Zeroth - order methods seem to handle these scenarios more easily, but may require more samples. 3. **Finding flat minima**: - The paper points out that finding flatter minima can improve the generalization ability of the model. Zeroth - order methods naturally tend towards these more robust solutions, while gradient - based methods may converge to sharp local minima, which are very sensitive to small perturbations. ### Solutions To address the above challenges, the paper proposes a new method that integrates sharpness - aware minimization (SAM) into gradient - based reinforcement learning algorithms. Specifically: - **SHAC - ASAM algorithm**: Combines the efficiency of the Short Horizon Actor - Critic (SHAC) algorithm and the robustness of Adaptive Sharpness - Aware Minimization (ASAM). - **Experimental results**: Experiments show that SHAC - ASAM significantly enhances the robustness of the policy in the Ant and Humanoid environments, especially in noisy and out - of - distribution environments. ### Summary By introducing sharpness - aware optimization, this paper aims to balance efficient learning and robust simulation - to - real transfer, thus providing a promising solution for the development of robot motion strategies. This method not only improves sample efficiency but also enhances the generalization ability of the policy in complex environments.

Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments

Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Assessing Transferability From Simulation to Reality for Reinforcement Learning

A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

Learning Robust and Adaptive Real-World Continuous Control Using Simulation and Transfer Learning

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Generalization in Transfer Learning

Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

Learning Quadrupedal Locomotion via Differentiable Simulation

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

Sim-to-Real Transfer with Neural-Augmented Robot Simulation

Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies

Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild

i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

Sim-to-Real Learning for Bipedal Locomotion Under Unsensed Dynamic Loads