Learning Generalizable Policy for Obstacle-Aware Autonomous Drone Racing

Yueqian Liu
2024-11-07
Abstract:Autonomous drone racing has gained attention for its potential to push the boundaries of drone navigation technologies. While much of the existing research focuses on racing in obstacle-free environments, few studies have addressed the complexities of obstacle-aware racing, and approaches presented in these studies often suffer from overfitting, with learned policies generalizing poorly to new environments. This work addresses the challenge of developing a generalizable obstacle-aware drone racing policy using deep reinforcement learning. We propose applying domain randomization on racing tracks and obstacle configurations before every rollout, combined with parallel experience collection in randomized environments to achieve the goal. The proposed randomization strategy is shown to be effective through simulated experiments where drones reach speeds of up to 70 km/h, racing in unseen cluttered environments. This study serves as a stepping stone toward learning robust policies for obstacle-aware drone racing and general-purpose drone navigation in cluttered environments. Code is available at <a class="link-external link-https" href="https://github.com/ErcBunny/IsaacGymEnvs" rel="external noopener nofollow">this https URL</a>.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop a general autonomous drone racing strategy with obstacle - sensing ability in various complex environments. Specifically, most of the existing research focuses on drone racing in obstacle - free environments, while relatively little research is done on complex obstacle - filled environments, and the existing methods are prone to over - fitting to specific environments, resulting in poor performance in new environments. Therefore, this paper aims to train a strategy that can be generalized in different tracks and obstacle configurations through the methods of deep reinforcement learning (DRL) and domain randomization. ### Main Problem Description 1. **Obstacle - Sensing Ability**: Most of the existing drone racing research ignores the existence of obstacles or only tests in simple environments. However, in practical applications, drones need to navigate quickly in environments full of obstacles, which poses higher requirements for algorithms. 2. **Generalization Ability of the Strategy**: Existing methods are usually trained in specific environments, resulting in the learned strategies being unable to generalize well to new, unseen environments. This limits the effectiveness of these methods in practical applications. 3. **Balance between High - Speed Flight and Obstacle Avoidance**: Autonomous drone racing requires not only that the drone can fly fast but also that it can avoid collisions at high speeds. How to achieve effective obstacle avoidance while ensuring speed is a challenge. ### Solution To solve the above problems, the author proposes a general obstacle - sensing drone racing strategy based on deep reinforcement learning. Specific methods include: - **Domain Randomization**: By randomizing the track and obstacle configurations, the drone is exposed to diverse environments during the training process. This can improve the generalization ability of the strategy, enabling it to work effectively in unseen environments. - **Parallel Experience Collection**: Collect experiences in parallel in multiple randomized environments to further enhance the robustness and adaptability of the strategy. - **Efficient Learning Framework**: Utilize an efficient simulation environment (such as Isaac Gym) and optimized computing resources to accelerate the training process and ensure that the strategy can run stably in complex environments. Through these methods, the author verifies that the proposed strategy can achieve high - speed and safe flight in unseen tracks and obstacle configurations, thus providing new ideas and technical support for future autonomous drone racing and general drone navigation. ### Formula Summary In this paper, the formulas involved are mainly used to describe the dynamic model of the drone and the reward function. For example, the dynamic equation of the drone is as follows: \[ \begin{bmatrix} \dot{p}_W \\ \dot{q}_{WB} \\ \dot{v}_W \\ \dot{\omega}_B \\ \dot{\Omega} \end{bmatrix} = \begin{bmatrix} v_W \\ \frac{1}{2} q_{WB} \otimes \begin{bmatrix} 0 \\ \omega_B^T \end{bmatrix}^T \\ g_W + \frac{1}{m} R_{WB} (f_a + f_d) \\ J^{-1} (\tau_a + \tau_d - \omega_B \times J \omega_B) \\ k_r (\Omega_s - \Omega) \end{bmatrix} \] where \( p_W \) represents the position of the drone, \( q_{WB} \) represents the attitude quaternion, \( v_W \) represents the linear velocity, \( \omega_B \) represents the angular velocity, \( g_W \) represents the gravitational acceleration, \( R_{WB} \) represents the rotation matrix, \( f_a \) and \( \tau_a \) respectively represent the force and torque generated by the actuator, and \( f_d \) and \( \tau_d \) respectively represent the air resistance. In addition, the reward function \( r \) is the weighted sum of multiple reward terms: \[ r = \lambda [r_{prog} \