What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to design an effective controller for an under - actuated double - pendulum system (including acrobot and pendubot settings) in the AI Olympics competition to achieve the swing - up and stable control of the pendulum. Specifically, the goals of the competition are: 1. **Simulation phase**: - Design a controller that can make the robot swing up and stabilize at the vertical position. - The controller needs to complete the task within 10 seconds, and the system is simulated at a frequency of 500Hz. - Evaluate the robustness of the controller. 2. **Actual hardware phase**: - Test the performance of the controller on the physical system. - Solve the differences between the simulation environment and the real environment, such as the influence of different factors such as mass, length, and friction effects. To achieve these goals, the author proposes a method based on model - free deep reinforcement learning combined with evolutionary strategy. The specific steps are as follows: - **Initial training**: Use the Soft Actor - Critic (SAC) algorithm to train the agent so that it can perform the main tasks (i.e., swing - up and stabilization). The SAC algorithm promotes exploration and improves the robustness of the policy by introducing an entropy term. - **Fine - tuning**: Further optimize the agent through an evolutionary algorithm (such as Separable Natural Evolution Strategy, SNES) to better adapt to the actual scoring function of the competition. - **Reward function design**: Since the reward function of the competition is complex and difficult to optimize directly, the author designs a surrogate reward function to facilitate the optimization in the training process. Through this method, the author hopes to achieve excellent results in the simulation environment and make the controller also show good performance and robustness on the actual hardware. ### Key formulas 1. **Optimization objective of SAC**: \[ J(\pi)=\mathbb{E}_{s_t, a_t\sim\pi}\left[\sum_{t}\gamma^t\left(r(s_t, a_t)+\alpha H(\pi(\cdot|s_t))\right)\right] \] where \(H\) represents the entropy of the policy, and \(\alpha\) is a temperature parameter that controls the importance of the entropy term. 2. **Surrogate reward function**: \[ R(s, a)=\begin{cases} V+\alpha[1 + \cos(\theta_2)]^2-\beta T&\text{if }y > y_{th}\\ -\rho_1a^2-\phi_1\Delta a+V-\rho_2a^2-\phi_2\Delta a-\eta\|\dot{s}\|^2&\text{otherwise} \end{cases} \] where: - \(V\) is the potential energy of the system, - \(T\) is the kinetic energy of the system, - \(a\) is the normalized action, - \(\Delta a\) is the difference between the current action and the previous action, - \(\|\dot{s}\|^2 = \dot{\theta}_1^2+\dot{\theta}_2^2\) is the squared norm of the angular velocity of the robot. Through the above methods, the author aims to find a solution that can be trained efficiently and can also cope with practical challenges.

AI Olympics challenge with Evolutionary Soft Actor Critic

AI Olympics challenge with Evolutionary Soft Actor Critic

Learning control of underactuated double pendulum with Model-Based Reinforcement Learning

Velocity-History-Based Soft Actor-Critic Tackling IROS'24 Competition "AI Olympics with RealAIGym"

AI-Olympics: Exploring the Generalization of Agents through Open Competitions

The AI Driving Olympics at NeurIPS 2018

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Generative Adversarial Neuroevolution for Control Behaviour Imitation

Evolutionary reinforcement learning algorithm for large-scale multi-agent cooperation and confrontation applications

Neuroevolution of Recurrent Architectures on Control Tasks

Multi-AI competing and winning against humans in iterated Rock-Paper-Scissors game

Multi-Agent Interplay in a Competitive Survival Environment

Deep Q-Network for AI Soccer

Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition

Integrating the Latest Artificial Intelligence Algorithms into the RoboCup Rescue Simulation Framework

SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents

Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability

Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

Moving Beyond the Turing Test with the Allen AI Science Challenge

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Evolving Strategies for Competitive Multi-Agent Search