Abstract:The nonlinear and unstable aerodynamic interference generated by the tandem wings of such biomimetic systems poses substantial challenges for motion control, especially under multiple random operating conditions. To address these challenges, the Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed. This plug-and-play, fully on-the-job, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module alongside a lightweight network optimized for real-time control. To validate the performance and the rationality of the module design, experiments were conducted under six challenging operating conditions, comparing seven different algorithms. The results demonstrate that the CRL2E algorithm achieves safe and stable training within the first 500 steps, improving tracking accuracy by 14 to 66 times compared to the Soft Actor-Critic, Proximal Policy Optimization, and Twin Delayed Deep Deterministic Policy Gradient algorithms. Additionally, CRL2E significantly enhances performance under various random operating conditions, with improvements in tracking accuracy ranging from 8.3% to 60.4% compared to the Concerto Reinforcement Learning (CRL) algorithm. The convergence speed of CRL2E is 36.11% to 57.64% faster than the CRL algorithm with only the Composer Perturbation and 43.52% to 65.85% faster than the CRL algorithm when both the Composer Perturbation and Time-Interleaved Capability Perturbation are introduced, especially in conditions where the standard CRL struggles to converge. Hardware tests indicate that the optimized lightweight network structure excels in weight loading and average inference time, meeting real-time control requirements.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the efficient, safe and stable real - time motion control of the Direct - Drive Tandem - Wing Experimental Platform (DDTWEP) under various random working conditions. Specifically, the research focuses on:
1. **Non - linear and Unstable Aerodynamic Interference**: Since the bionic aircraft inspired by the dragonfly has hovering ability, the non - linear and unstable aerodynamic interference generated by its tandem wings poses a great challenge to motion control, especially under various random working conditions.
2. **High - Frequency Control Requirement**: The experimental platform needs to achieve high - frequency control in non - linear and unstable systems, which requires that the control system can learn and adapt in the case of unknown dynamics and unknown disturbances.
3. **Limitations of Existing Algorithms**:
- **Traditional Control Algorithms**: These algorithms cannot ensure robustness in the face of unmodeled perturbations, and a large amount of prior knowledge, simulation and experimental data are required to develop these controllers.
- **Conventional Reinforcement Learning Algorithms**: Such as Soft Actor - Critic (SAC), Proximal Policy Optimization (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3), although they perform well in some aspects, they have deficiencies in safety, convergence speed and multi - task adaptability.
To solve these problems, the authors propose a plug - and - play, fully online, real - time reinforcement learning algorithm named Concerto Reinforcement Learning Extension (CRL2E). This algorithm improves the existing control methods in the following ways:
- **Physics - Inspired Rule Strategy**: A method for generating a physics - inspired rule strategy is introduced, combined with a perturbation module, to improve control accuracy.
- **Light - Weight Network Structure**: An optimized light - weight network structure is designed to meet the requirements of real - time control.
- **Efficient Training Efficiency**: The CRL2E algorithm shows faster convergence speed and higher tracking accuracy under multiple random working conditions.
Through these improvements, the CRL2E algorithm not only improves the control performance in complex environments, but also ensures the safety and stability in the initial training stage, thus providing a new and effective method for the control of mechanical systems.
### Formula Summary
- **Expected Motion Equation**:
\[
\phi_{\text{exp},in} = A_i\cdot\sin(2\pi f\cdot t+\varphi_i)
\]
where $\phi_{\text{exp},in}$ represents the expected flapping - angle position of the $i$ - th wing in the next $n$ steps, $A_i$ is the expected flapping amplitude of the $i$ - th wing, $f$ is the flapping frequency, and $\varphi_i$ is the phase difference of the $i$ - th wing.
- **Motor Torque Input**:
\[
T_{M,i}=\text{Const}_{\text{motor}}\cdot\text{action}
\]
- **8 - Degree - of - Freedom Dynamic Equation**:
\[
\ddot{\phi}_{w,1}=-\frac{J_{W,YY}\cdot K_{A,1}}{C_1}\cdot\phi_1+\frac{J_{W,YY}}{C_1}\cdot T_{M,1}+\frac{J_{W,YY}}{C_1}\cdot T_{ZW,1}-\frac{J_{W,YZ}}{C_1}\cdot T_{YW,1}-\frac{J_{W,YZ}}{C_1}\cdot T_{VTM,1}-\frac{J_{W,YZ}}{C_1}\cdot T_{YA W,1}
\]
(others are similar)
Through these formulas and the improved algorithm, CRL2E realizes the effective control of the Direct - Drive Tandem - Wing Experimental Platform and solves the problem that the existing methods perform poorly in changeable environments.