Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors

Aditya M. Deshpande,Rumit Kumar,Ali A. Minai,Manish Kumar
DOI: https://doi.org/10.48550/arXiv.2007.07793
2020-07-16
Abstract:In this paper, we present a novel developmental reinforcement learning-based controller for a quadcopter with thrust vectoring capabilities. This multirotor UAV design has tilt-enabled rotors. It utilizes the rotor force magnitude and direction to achieve the desired state during flight. The control policy of this robot is learned using the policy transfer from the learned controller of the quadcopter (comparatively simple UAV design without thrust vectoring). This approach allows learning a control policy for systems with multiple inputs and multiple outputs. The performance of the learned policy is evaluated by physics-based simulations for the tasks of hovering and way-point navigation. The flight simulations utilize a flight controller based on reinforcement learning without any additional PID components. The results show faster learning with the presented approach as opposed to learning the control policy from scratch for this new UAV design created by modifications in a conventional quadcopter, i.e., the addition of more degrees of freedom (4-actuators in conventional quadcopter to 8-actuators in tilt-rotor quadcopter). We demonstrate the robustness of our learned policy by showing the recovery of the tilt-rotor platform in the simulation from various non-static initial conditions in order to reach a desired state. The developmental policy for the tilt-rotor UAV also showed superior fault tolerance when compared with the policy learned from the scratch. The results show the ability of the presented approach to bootstrap the learned behavior from a simpler system (lower-dimensional action-space) to a more complex robot (comparatively higher-dimensional action-space) and reach better performance faster.
Robotics,Artificial Intelligence,Machine Learning,Systems and Control
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a novel reinforcement - learning controller for a quadcopter UAV with thrust - vectoring capabilities. Specifically, the goals of the paper are: 1. **Design control strategies**: By using developmental reinforcement learning, develop an effective control strategy for a quadcopter UAV with thrust - vectoring rotors. This multi - rotor UAV is designed with tiltable rotors, which can utilize the magnitude and direction of the rotor forces to achieve the desired state during flight. 2. **Accelerate the learning process**: Accelerate the learning of control strategies for complex systems by transferring knowledge from the learning strategies of simpler quadcopter UAVs (without thrust - vectoring capabilities). This enables the system to learn how to control the new UAV design with more degrees of freedom (from 4 actuators to 8 actuators) in a shorter time. 3. **Improve robustness and fault - tolerance**: Verify the robustness and fault - tolerance of the learned control strategies. Experiments show that, compared to learning from scratch, the developmental learning method demonstrates better recovery capabilities and higher fault - tolerance performance under various non - static initial conditions. ### Main contributions - **First application of deep reinforcement - learning control**: Although previous studies have explored the use of reinforcement - learning algorithms to learn control strategies for quadcopter UAVs, this paper is the first to apply this method to the more complex tilt - rotor quadcopter UAV. - **Advantages of phased learning**: Demonstrate that curriculum learning can learn better - quality control strategies for high - degree - of - freedom systems more quickly, reducing the number of iterations. ### Method overview - **Dynamic model**: The paper describes in detail the dynamic model of the tilt - rotor quadcopter UAV and presents the equations of motion. These equations describe the translational and rotational motions of the UAV in the world coordinate system. \[ \begin{bmatrix} \ddot{x} \\ \ddot{y} \\ \ddot{z} \end{bmatrix} = \frac{R_{B/E}}{m} \begin{bmatrix} F_2 s\theta_2 + F_4 s\theta_4 \\ - F_1 s\theta_1 - F_3 s\theta_3 \\ F_1 c\theta_1 + F_2 c\theta_2 + F_3 c\theta_3 + F_4 c\theta_4 \end{bmatrix}- \begin{bmatrix} 0 \\ 0 \\ g \end{bmatrix} \] \[ I \begin{bmatrix} \dot{p} \\ \dot{q} \\ \dot{r} \end{bmatrix} = \begin{bmatrix} l(F_2 c\theta_2 - F_4 c\theta_4)+ M_2 s\theta_2 - M_4 s\theta_4 \\ l(F_3 c\theta_3 - F_1 c\theta_1)- M_3 s\theta_3 + M_1 s\theta_1 \\ l(- F_1 s\theta_1 - F_2 s\theta_2 + F_3 s\theta_3 + F_4 s\theta_4)- M_1 c\theta_1 + M_2 c\theta_2 + M_3 c\theta_3 - M_4 c\theta_4 \end{bmatrix}- \begin{bmatrix} p \\ q \\ r \end{bmatrix} \times I \begin{bmatrix} p \\ q \\ r \end{bmatrix} \] - **Policy training