Abstract:The nonlinear and unstable aerodynamic interference generated by the tandem wings of such biomimetic systems poses substantial challenges for motion control, especially under multiple random operating conditions. To address these challenges, the Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed. This plug-and-play, fully on-the-job, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module alongside a lightweight network optimized for real-time control. To validate the performance and the rationality of the module design, experiments were conducted under six challenging operating conditions, comparing seven different algorithms. The results demonstrate that the CRL2E algorithm achieves safe and stable training within the first 500 steps, improving tracking accuracy by 14 to 66 times compared to the Soft Actor-Critic, Proximal Policy Optimization, and Twin Delayed Deep Deterministic Policy Gradient algorithms. Additionally, CRL2E significantly enhances performance under various random operating conditions, with improvements in tracking accuracy ranging from 8.3% to 60.4% compared to the Concerto Reinforcement Learning (CRL) algorithm. The convergence speed of CRL2E is 36.11% to 57.64% faster than the CRL algorithm with only the Composer Perturbation and 43.52% to 65.85% faster than the CRL algorithm when both the Composer Perturbation and Time-Interleaved Capability Perturbation are introduced, especially in conditions where the standard CRL struggles to converge. Hardware tests indicate that the optimized lightweight network structure excels in weight loading and average inference time, meeting real-time control requirements.

Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

Natural Gradient Based Reinforcement Learning Algorithm Using Active Stimulating

Model Reference Output Feedback Control Using Episodic Natural Actor-Critic

Learning Linear Parameter-Varying Control of Small-Scale Helicopter Using Episodic Natural Actor-Critic Method

Episodic Reinforcement Learning with Expanded State-reward Space

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Neural Episodic Control with State Abstraction

Episodic Reinforcement Learning with Associative Memory.

Control of Nonaffine Nonlinear Discrete-Time Systems Using Reinforcement-Learning-Based Linearly Parameterized Neural Networks

Adaptive Evolutionary Reinforcement Learning with Policy Direction

Batch process control based on reinforcement learning with segmented prioritized experience replay

Online Reinforcement Learning Neural Network Controller Design for Nanomanipulation

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

AutoEG: Automated Experience Grafting for Off-Policy Deep Reinforcement Learning

Recursive Least Squares Advantage Actor-Critic Algorithms

Deep Reinforcement Learning with Parametric Episodic Memory

Evolving Constrained Reinforcement Learning Policy

A Recurrent Reinforcement Learning Approach Applicable To Highly Uncertain Environments

Reinforcement Learning Experience Reuse with Policy Residual Representation

A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions

Prioritized Experience Replay in Multi-Actor-Attention-Critic for Reinforcement Learning