Curiosity-Driven Reinforcement Learning based Low-Level Flight Control

Amir Ramezani Dooraki,Alexandros Iosifidis
DOI: https://doi.org/10.48550/arXiv.2307.15724
2023-07-28
Abstract:Curiosity is one of the main motives in many of the natural creatures with measurable levels of intelligence for exploration and, as a result, more efficient learning. It makes it possible for humans and many animals to explore efficiently by searching for being in states that make them surprised with the goal of learning more about what they do not know. As a result, while being curious, they learn better. In the machine learning literature, curiosity is mostly combined with reinforcement learning-based algorithms as an intrinsic reward. This work proposes an algorithm based on the drive of curiosity for autonomous learning to control by generating proper motor speeds from odometry data. The quadcopter controlled by our proposed algorithm can pass through obstacles while controlling the Yaw direction of the quad-copter toward the desired location. To achieve that, we also propose a new curiosity approach based on prediction error. We ran tests using on-policy, off-policy, on-policy plus curiosity, and the proposed algorithm and visualized the effect of curiosity in evolving exploration patterns. Results show the capability of the proposed algorithm to learn optimal policy and maximize reward where other algorithms fail to do so.
Machine Learning,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop a curiosity - driven reinforcement learning algorithm for the low - level flight control of quad - copters. Specifically, this research aims to learn from odometry data by generating appropriate motor speeds, enabling the quad - copter to pass through obstacles and control its yaw direction to reach the desired position. ### Main Problems 1. **Low - level flight control**: Traditional methods are difficult to directly generate suitable motor speeds from sensor data (such as odometry data) to achieve precise flight control. 2. **Autonomous exploration and learning**: In a complex environment, how can the drone learn the optimal strategy autonomously, maximize the cumulative reward, and avoid falling into local optimal solutions? 3. **Curiosity mechanism**: Introduce curiosity as an internal motivation to promote more effective exploration and learning, especially in unknown or dynamic environments. ### Solutions To solve the above problems, the author proposes a new prediction - error - based curiosity method and applies it to the reinforcement learning framework. The main contributions include: - **New curiosity calculation method**: The prediction - error - based curiosity reward mechanism enables the drone to continuously search for novel states during the exploration process. - **Multiple Value - Function Heads**: Separate the external reward and the curiosity reward for processing, thereby stabilizing the learning process. - **Curiosity module in high - dimensional state space**: By using segments of states and segments of actions, the working method of the curiosity module is improved, making it more suitable for low - level control tasks. - **Trajectory update mechanism**: Use a decay factor to update the trajectory of the state curiosity value to better reflect the long - term impact. ### Experimental Verification To verify the effectiveness of the proposed method, the author designed a simulation environment, which includes a quad - copter and three obstacles with randomly initialized positions. The experimental results show that, compared with traditional reinforcement learning methods, the proposed algorithm can more effectively learn the optimal strategy and performs well in passing through obstacles and controlling the yaw direction. Through these improvements, this research not only enhances the autonomous flight ability of quad - copters in complex environments, but also provides new ideas for further exploring the application of reinforcement learning in robot control.