What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use advanced control algorithms to stabilize chaotic under - actuated dynamic systems in the IROS’24 competition "AI Olympics with RealAIGym". Specifically, the competition requires contestants to design a method that enables the robot to swing from a hanging position and stabilize to an upright state, especially to maintain the stability and robustness of the system in the face of random disturbances. ### Problem Background 1. **Competition Objectives**: - The competition aims to evaluate the motion intelligence of robots through standardized benchmark tasks, especially for under - actuated double - inverted pendulum systems (such as Pendubot and Acrobot). - Performance evaluation includes the performance score in a single simulation run and the robustness score in multiple simulation runs. The latter takes into account the effects of physical parameter changes, noise, and other perturbations. 2. **Challenges**: - Under - actuated systems and their chaotic nature make control very difficult. - Random disturbances (such as strong thrusts) occur randomly during the swinging and stabilizing processes, increasing the control difficulty. - Sim - to - Real gap: Strategies trained in the simulation environment may perform poorly on real systems. ### Solutions To address these challenges, the authors propose an improved method based on the Soft Actor - Critic (SAC) algorithm, called Velocity - History - Based Soft Actor - Critic. The main innovations include: 1. **History Encoding**: - A "context" vector is introduced into the state representation. This vector encodes past velocity measurements through a convolutional neural network (CNN) to capture the historical information of the system. - This method helps the model better understand and adapt to dynamic changes in non - Markovian or partially observable systems. 2. **Reward Design**: - A dense reward function is designed to provide more abundant feedback signals, thereby accelerating the learning speed and improving the performance of the final strategy. - The specific reward function \( R_2(s, a) \) contains two main terms: the squared angular distance term and the regularization term \( E(s, a) \), which is used to penalize large angular velocity and torque changes. 3. **System Identification**: - The Sim - to - Real gap is narrowed by optimizing physical parameters to ensure that the behavior of the simulation environment is more consistent with that of the real system. - The differential evolution algorithm is used to minimize the difference between the simulated trajectory and the real trajectory. 4. **Multi - environment Training**: - The robustness of the model is improved by training in multiple different perturbation environments. - Certain types of perturbations (such as torque perturbations and action noise) are excluded to avoid over - complicating the training process. ### Results The experimental results show that the proposed method achieves significant performance improvement and robustness enhancement on the Pendubot system. Compared with benchmark controllers (such as iLQR, TVLQR, etc.), it performs well in terms of swing time and energy consumption. In addition, this method outperforms existing methods in multiple evaluation metrics, especially reaching a robustness score of 0.905. In conclusion, this paper successfully solves the problem of stable control of under - actuated dynamic systems by introducing historical information encoding and optimizing reward design, and achieves excellent results in the competition.

Velocity-History-Based Soft Actor-Critic Tackling IROS'24 Competition "AI Olympics with RealAIGym"

PAC-Bayesian Soft Actor-Critic Learning

Solving the swing-up and balance task for the Acrobot and Pendubot with SAC

Model Reference Output Feedback Control Using Episodic Natural Actor-Critic

Generalizing soft actor-critic algorithms to discrete action spaces

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks

AI Olympics challenge with Evolutionary Soft Actor Critic

Soft Actor-Critic Algorithm with Truly-satisfied Inequality Constraint

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

SACPlanner: Real-World Collision Avoidance with a Soft Actor Critic Local Planner and Polar State Representations

Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning

Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement Learning

Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

OPAC: Opportunistic Actor-Critic

Image-based Regularization for Action Smoothness in Autonomous Miniature Racing Car with Deep Reinforcement Learning

Combining Deep Reinforcement Learning And Local Control For The Acrobot Swing-up And Balance Task

Density estimation based soft actor-critic: deep reinforcement learning for static output feedback control with measurement noise

Learn 2 Rage: Experiencing The Emotional Roller Coaster That Is Reinforcement Learning

Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models