Fuzzy Ensembles of Reinforcement Learning Policies for Robotic Systems with Varied Parameters

Abdel Gafoor Haddad,Mohammed B. Mohiuddin,Igor Boiko,Yahya Zweiri
2023-11-09
Abstract:Reinforcement Learning (RL) is an emerging approach to control many dynamical systems for which classical control approaches are not applicable or insufficient. However, the resultant policies may not generalize to variations in the parameters that the system may exhibit. This paper presents a powerful yet simple algorithm in which collaboration is facilitated between RL agents that are trained independently to perform the same task but with different system parameters. The independency among agents allows the exploitation of multi-core processing to perform parallel training. Two examples are provided to demonstrate the effectiveness of the proposed technique. The main demonstration is performed on a quadrotor with slung load tracking problem in a real-time experimental setup. It is shown that integrating the developed algorithm outperforms individual policies by reducing the RMSE tracking error. The robustness of the ensemble is also verified against wind disturbance.
Robotics,Systems and Control
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the generalization problem of Reinforcement Learning (RL) policies when system parameters change. Specifically, the paper proposes a robust and simple algorithm that uses fuzzy clustering techniques to collaborate multiple independently trained RL agents to tackle control tasks for robotic systems with different parameters. The main contributions of the paper are as follows: 1. **Developed a fuzzy clustering-based RL agent ensemble**: This method allows for the control of dynamic systems, maintaining stable performance even when these systems' parameters change online or offline, without requiring additional training. 2. **Practical experimental validation**: The research team tested the proposed algorithm on a real quadrotor UAV system with a suspended load and demonstrated its effectiveness. Experimental results showed that the ensemble algorithm significantly reduced the Root Mean Square Error (RMSE) compared to individual policies, particularly excelling in tracking error. The paper also discusses related work in the existing field, such as Domain Randomization (DR) and ensemble learning in RL, and points out their respective limitations. Additionally, the authors demonstrate through theoretical analysis, simulation, and experimental results that the proposed method has advantages over traditional DR techniques, especially in handling system parameters that change over time.