Abstract:The sample inefficiency of reinforcement learning (RL) remains a significant challenge in robotics. RL requires large-scale simulation and, still, can cause long training times, slowing down research and innovation. This issue is particularly pronounced in vision-based control tasks where reliable state estimates are not accessible. Differentiable simulation offers an alternative by enabling gradient back-propagation through the dynamics model, providing low-variance analytical policy gradients and, hence, higher sample efficiency. However, its usage for real-world robotic tasks has yet been limited. This work demonstrates the great potential of differentiable simulation for learning quadrotor control. We show that training in differentiable simulation significantly outperforms model-free RL in terms of both sample efficiency and training time, allowing a policy to learn to recover a quadrotor in seconds when providing vehicle state and in minutes when relying solely on visual features. The key to our success is two-fold. First, the use of a simple surrogate model for gradient computation greatly accelerates training without sacrificing control performance. Second, combining state representation learning with policy learning enhances convergence speed in tasks where only visual features are observable. These findings highlight the potential of differentiable simulation for real-world robotics and offer a compelling alternative to conventional RL approaches.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of low sample efficiency in reinforcement learning (RL) for robotic control tasks, especially in vision - based control tasks. Specifically, the paper explores the following issues: 1. **Sample efficiency problem**: - Reinforcement learning requires a large amount of simulation data and a long training time, which limits the speed of research and innovation. - In vision - based control tasks, the sample efficiency problem is particularly prominent due to the inability to obtain reliable state estimates. 2. **Limitations of traditional control methods**: - Traditional cascade controllers rely on accurate state estimates. When the state estimates are inaccurate or noisy, these methods become unreliable. - Learning - based methods (such as model - free reinforcement learning) perform well in some applications, but they face significant challenges in terms of sample complexity and training stability, especially in high - dimensional visual control tasks with observational data. 3. **Combining state representation learning and policy learning**: - In many real - world robotic scenarios, state information is not directly accessible, and neural networks must learn state representations and control policies simultaneously from feature observations, which increases sample complexity. ### The method proposed in the paper To solve the above problems, the author introduced differentiable simulation, an emerging learning method that allows back - propagating gradients through the dynamic model, thereby providing low - variance analytical policy gradients and improving sample efficiency. Specific contributions include: 1. **Using a simplified surrogate model to accelerate gradient calculation**: - Back - propagating through a simplified dynamic model can significantly accelerate the training speed without sacrificing performance. 2. **Combining state representation learning and policy learning**: - In vision - based tasks, pre - training neural network parameters to learn state representations can accelerate convergence and improve final performance. 3. **Verification in practical applications**: - Research shows that policies trained in different environments can be successfully deployed in the real world. For example, in the task of stabilizing a quadrotor drone after a manual throw, relying only on visual features without state estimation. ### Experimental results - **State - based control tasks**: The BPTT method using differentiable simulation is more than 8 times faster than PPO (Proximal Policy Optimization) and requires more than 80% less sample size. - **Vision - feature - based control tasks**: Even in more complex visual control tasks, the BPTT method significantly outperforms PPO, showing faster convergence speed and higher final performance. In conclusion, this paper demonstrates the great potential of differentiable simulation technology in quadrotor drone control tasks, especially in improving sample efficiency and shortening training time.

Learning Quadrotor Control From Visual Features Using Differentiable Simulation

Learning Quadruped Locomotion Using Differentiable Simulation

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Learning to Fly in Seconds

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight

Autonomous Vehicle Controllers From End-to-End Differentiable Simulation

QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust Control

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Zero-Shot Sim-To-Real Transfer of Robust and Generic Quadrotor Controller by Deep Reinforcement Learning

Multi-Task Reinforcement Learning for Quadrotors

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

Adaptive Image-Based Visual Servoing for Hovering Control of Quad-Rotor.

End-to-End and Highly-Efficient Differentiable Simulation for Robotics

Deterministic Policy Gradient with Integral Compensator for Robust Quadrotor Control

High-Speed Trajectory Tracking Control for Quadrotors Via Deep Reinforcement Learning

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight

Deep Reinforcement Learning-based Quadcopter Controller: A Practical Approach and Experiments

Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

Quadrotor motion control using deep reinforcement learning