Robust Deep Reinforcement Learning for Quadcopter Control

Aditya M. Deshpande,Ali A. Minai,Manish Kumar
DOI: https://doi.org/10.48550/arXiv.2111.03915
2021-11-07
Abstract:Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from Robust Control and RL. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control. RL agents were trained in a MuJoCo simulator. During testing, different environment parameters (unseen during the training) were used to validate the robustness of the trained policy for transfer from one environment to another. The robust policy outperformed the standard agents in these environments, suggesting that the added robustness increases generality and can adapt to non-stationary environments. Codes: <a class="link-external link-https" href="https://github.com/adipandas/gym_multirotor" rel="external noopener nofollow">this https URL</a>
Robotics,Artificial Intelligence,Machine Learning,Systems and Control,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient generalization ability of the quad - rotor UAV control strategy under different environmental conditions. Specifically, the control strategies trained using traditional reinforcement learning methods will experience a significant performance decline when transferred from one environment to another due to changes in environmental parameters (such as mass, moment of inertia, air resistance or friction, etc.). To overcome this challenge, this paper proposes a method based on Robust Markov Decision Process (RMDP) to train the control strategy of the quad - rotor UAV in order to improve its adaptability and robustness under different environmental conditions. By combining robust control theory and deep reinforcement learning, the paper aims to develop a control strategy that can maintain high performance in non - static environments. In the experiment, the researchers trained RL agents in the MuJoCo simulator and verified the robustness of the trained strategies by changing unseen environmental parameters. The results show that, compared with standard RL agents, the robust strategies perform better in these environments, indicating that the increased robustness improves the generality of the strategies, enabling them to adapt to non - static environments.