ConcertoRL: An Innovative Time-Interleaved Reinforcement Learning Approach for Enhanced Control in Direct-Drive Tandem-Wing Vehicles

Minghao Zhang,Bifeng Song,Changhao Chen,Xinyu Lang
2024-05-22
Abstract:In control problems for insect-scale direct-drive experimental platforms under tandem wing influence, the primary challenge facing existing reinforcement learning models is their limited safety in the exploration process and the stability of the continuous training process. We introduce the ConcertoRL algorithm to enhance control precision and stabilize the online training process, which consists of two main innovations: a time-interleaved mechanism to interweave classical controllers with reinforcement learning-based controllers aiming to improve control precision in the initial stages, a policy composer organizes the experience gained from previous learning to ensure the stability of the online training process. This paper conducts a series of experiments. First, experiments incorporating the time-interleaved mechanism demonstrate a substantial performance boost of approximately 70% over scenarios without reinforcement learning enhancements and a 50% increase in efficiency compared to reference controllers with doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts.
Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issue of precise control in direct-drive tandem wing aircraft, particularly improving control accuracy and safety during plug-in online training and control processes. Specifically: 1. **Control Challenges**: - Existing reinforcement learning models have limited safety during the exploration process and insufficient stability during continuous training. - The direct-drive experimental platform is affected by tandem wing interference, leading to nonlinear and unstable aerodynamic characteristics, which pose challenges for control. 2. **Proposed Solution**: - The ConcertoRL algorithm is proposed to enhance control accuracy and stabilize the online training process through two main innovations: - **Time Interleaving Mechanism**: Interleaving classical controllers with reinforcement learning-based controllers to improve control accuracy in the initial stages. - **Policy Orchestrator**: Organizing experiences gained from previous learning to ensure the stability of the online training process. 3. **Experimental Validation**: - Experiments show that with the time interleaving mechanism, performance improved by approximately 70%, and efficiency increased by 50% compared to the reference controller. - The policy orchestrator further enhanced the stability of ConcertoRL's online training. - Generalization experiments demonstrated that ConcertoRL is compatible with various classical controllers and can achieve excellent control effects under different parameters. In summary, this paper aims to solve the issues of accuracy and stability in the control process of direct-drive tandem wing aircraft through the ConcertoRL algorithm.