Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV

Beomyeol Yu,Taeyoung Lee
2024-02-27
Abstract:By leveraging the underlying structures of the quadrotor dynamics, we propose multi-agent reinforcement learning frameworks to innovate the low-level control of a quadrotor, where independent agents operate cooperatively to achieve a common goal. While single-agent reinforcement learning has been successfully applied in quadrotor controls, training a large monolithic network is often data-intensive and time-consuming. Moreover, achieving agile yawing control remains a significant challenge due to the strongly coupled nature of the quadrotor dynamics. To address this, we decompose the quadrotor dynamics into translational and yawing components and assign collaborative reinforcement learning agents to each part to facilitate more efficient training. Additionally, we introduce regularization terms to mitigate steady-state errors and prevent excessive maneuvers. Benchmark studies, including sim-to-sim transfer verification, demonstrate that our proposed training schemes substantially improve the convergence rate of training, while enhancing flight control performance and stability compared to traditional single-agent approaches.
Robotics,Systems and Control
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to innovate the low-level control methods of quadrotor UAVs (Quadrotor UAV) through Multi-Agent Reinforcement Learning (MARL). Specifically, the paper proposes a new framework that decomposes the dynamics of the quadrotor UAV into translational and yaw parts, and assigns them to cooperative reinforcement learning agents respectively, to achieve more efficient training and more stable flight control. The main contributions of the paper include: 1. **Improved flight stability and robustness**: By decoupling yaw control from position control, flight stability is significantly improved, especially suitable for large-angle flights. 2. **Reduced overall training time**: Each agent focuses on its specific task, which allows for faster convergence and requires less training data compared to complex single-agent systems. 3. **Introduction of regularization terms**: Two regularization terms are introduced to reduce steady-state errors in position tracking and prevent excessive control inputs. 4. **Flexible training scheme**: The proposed scheme can be combined with any multi-agent reinforcement learning algorithm. In this paper, it is specifically implemented with the Multi-Agent TD3 (MATD3) algorithm, demonstrating significant advantages in training efficiency over the single-agent TD3 algorithm.