Abstract:The nonlinear and unstable aerodynamic interference generated by the tandem wings of such biomimetic systems poses substantial challenges for motion control, especially under multiple random operating conditions. To address these challenges, the Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed. This plug-and-play, fully on-the-job, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module alongside a lightweight network optimized for real-time control. To validate the performance and the rationality of the module design, experiments were conducted under six challenging operating conditions, comparing seven different algorithms. The results demonstrate that the CRL2E algorithm achieves safe and stable training within the first 500 steps, improving tracking accuracy by 14 to 66 times compared to the Soft Actor-Critic, Proximal Policy Optimization, and Twin Delayed Deep Deterministic Policy Gradient algorithms. Additionally, CRL2E significantly enhances performance under various random operating conditions, with improvements in tracking accuracy ranging from 8.3% to 60.4% compared to the Concerto Reinforcement Learning (CRL) algorithm. The convergence speed of CRL2E is 36.11% to 57.64% faster than the CRL algorithm with only the Composer Perturbation and 43.52% to 65.85% faster than the CRL algorithm when both the Composer Perturbation and Time-Interleaved Capability Perturbation are introduced, especially in conditions where the standard CRL struggles to converge. Hardware tests indicate that the optimized lightweight network structure excels in weight loading and average inference time, meeting real-time control requirements.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the efficient, safe and stable real - time motion control of the Direct - Drive Tandem - Wing Experimental Platform (DDTWEP) under various random working conditions. Specifically, the research focuses on: 1. **Non - linear and Unstable Aerodynamic Interference**: Since the bionic aircraft inspired by the dragonfly has hovering ability, the non - linear and unstable aerodynamic interference generated by its tandem wings poses a great challenge to motion control, especially under various random working conditions. 2. **High - Frequency Control Requirement**: The experimental platform needs to achieve high - frequency control in non - linear and unstable systems, which requires that the control system can learn and adapt in the case of unknown dynamics and unknown disturbances. 3. **Limitations of Existing Algorithms**: - **Traditional Control Algorithms**: These algorithms cannot ensure robustness in the face of unmodeled perturbations, and a large amount of prior knowledge, simulation and experimental data are required to develop these controllers. - **Conventional Reinforcement Learning Algorithms**: Such as Soft Actor - Critic (SAC), Proximal Policy Optimization (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3), although they perform well in some aspects, they have deficiencies in safety, convergence speed and multi - task adaptability. To solve these problems, the authors propose a plug - and - play, fully online, real - time reinforcement learning algorithm named Concerto Reinforcement Learning Extension (CRL2E). This algorithm improves the existing control methods in the following ways: - **Physics - Inspired Rule Strategy**: A method for generating a physics - inspired rule strategy is introduced, combined with a perturbation module, to improve control accuracy. - **Light - Weight Network Structure**: An optimized light - weight network structure is designed to meet the requirements of real - time control. - **Efficient Training Efficiency**: The CRL2E algorithm shows faster convergence speed and higher tracking accuracy under multiple random working conditions. Through these improvements, the CRL2E algorithm not only improves the control performance in complex environments, but also ensures the safety and stability in the initial training stage, thus providing a new and effective method for the control of mechanical systems. ### Formula Summary - **Expected Motion Equation**: \[ \phi_{\text{exp},in} = A_i\cdot\sin(2\pi f\cdot t+\varphi_i) \] where $\phi_{\text{exp},in}$ represents the expected flapping - angle position of the $i$ - th wing in the next $n$ steps, $A_i$ is the expected flapping amplitude of the $i$ - th wing, $f$ is the flapping frequency, and $\varphi_i$ is the phase difference of the $i$ - th wing. - **Motor Torque Input**: \[ T_{M,i}=\text{Const}_{\text{motor}}\cdot\text{action} \] - **8 - Degree - of - Freedom Dynamic Equation**: \[ \ddot{\phi}_{w,1}=-\frac{J_{W,YY}\cdot K_{A,1}}{C_1}\cdot\phi_1+\frac{J_{W,YY}}{C_1}\cdot T_{M,1}+\frac{J_{W,YY}}{C_1}\cdot T_{ZW,1}-\frac{J_{W,YZ}}{C_1}\cdot T_{YW,1}-\frac{J_{W,YZ}}{C_1}\cdot T_{VTM,1}-\frac{J_{W,YZ}}{C_1}\cdot T_{YA W,1} \] (others are similar) Through these formulas and the improved algorithm, CRL2E realizes the effective control of the Direct - Drive Tandem - Wing Experimental Platform and solves the problem that the existing methods perform poorly in changeable environments.

A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions

ConcertoRL: An Innovative Time-Interleaved Reinforcement Learning Approach for Enhanced Control in Direct-Drive Tandem-Wing Vehicles

C2: Co-design of Robots Via Concurrent-Network Coupling Online and Offline Reinforcement Learning

C^2:Co-design of Robots via Concurrent Networks Coupling Online and Offline Reinforcement Learning

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Model Reference Output Feedback Control Using Episodic Natural Actor-Critic

Retro-RL: Reinforcing Nominal Controller With Deep Reinforcement Learning for Tilting-Rotor Drones

Residual Reinforcement Learning for Motion Control of a Bionic Exploration Robot—RoboDact

Deployable Reinforcement Learning with Variable Control Rate

Reinforcement learning based closed‐loop reference model adaptive flight control system design

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning control method for real‐time hybrid simulation based on deep deterministic policy gradient algorithm

How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

Robust Control Strategy for Quadrotor Drone Using Reference Model-Based Deep Deterministic Policy Gradient

C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems

Trajectory Planning for Teleoperated Space Manipulators Using Deep Reinforcement Learning

Robust Adaptive Ensemble Adversary Reinforcement Learning

End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight

Wing Kinematics-Based Flight Control Strategy in Insect-Inspired Flight Systems: Deep Reinforcement Learning Gives Solutions and Inspires Controller Design in Flapping MAVs

One-shot sim-to-real transfer policy for robotic assembly via reinforcement learning with visual demonstration

Trajectory tracking control based on deep reinforcement learning and ensemble random network distillation for robotic manipulator