Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control

Jonaid Shianifar,Michael Schukat,Karl Mason
2024-06-12
Abstract:In this paper, we explore the optimization of hyperparameters for the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms using the Tree-structured Parzen Estimator (TPE) in the context of robotic arm control with seven Degrees of Freedom (DOF). Our results demonstrate a significant enhancement in algorithm performance, TPE improves the success rate of SAC by 10.48 percentage points and PPO by 34.28 percentage points, where models trained for 50K episodes. Furthermore, TPE enables PPO to converge to a reward within 95% of the maximum reward 76% faster than without TPE, which translates to about 40K fewer episodes of training required for optimal performance. Also, this improvement for SAC is 80% faster than without TPE. This study underscores the impact of advanced hyperparameter optimization on the efficiency and success of deep reinforcement learning algorithms in complex robotic tasks.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the performance of deep reinforcement learning (DRL) algorithms in 7 - degree - of - freedom (DOF) robotic arm control by optimizing hyper - parameters. Specifically, the author explores the use of Tree - structured Parzen Estimator (TPE) to optimize the hyper - parameters of Soft Actor - Critic (SAC) and Proximal Policy Optimization (PPO) algorithms, in order to increase the success rate and convergence speed of robotic arm control tasks. ### Main Problem Summary: 1. **Improve Learning Efficiency**: - Optimizing the hyper - parameters of SAC and PPO algorithms through TPE significantly improves the learning efficiency of the model. - TPE reduces the number of training times required for the PPO algorithm to reach 95% of the maximum reward by approximately 40,630 times, that is, it accelerates by 76.32%. - For the SAC algorithm, the improved learning speed is 80.39% faster than when not optimized. 2. **Increase Task Success Rate**: - The success rates of the optimized SAC and PPO models in 100,000 test targets are significantly improved. - After being optimized by TPE, the success rate of the SAC algorithm from 20,000 to 100,000 training rounds has increased from 3.22% to 89.75% respectively. - The success rate of the PPO algorithm has also increased from 8.72% to 89.41%. 3. **Accelerate Convergence Speed**: - The SAC and PPO models optimized by TPE have reached a higher average reward in a shorter time, indicating faster convergence to the optimal policy. ### Solutions: - **Hyper - parameter Optimization**: Use TPE for hyper - parameter optimization to explore more effective hyper - parameter configurations. - **Experimental Verification**: Verify the performance of the optimized model at different training stages through a large number of experiments to ensure its stability and reliability. - **Environment Simulation**: Use the Franka Emika Panda robotic arm to conduct simulation tests in the PyBullet and Gymnasium environments to ensure the safety and repeatability of the experiments. Through these methods, the paper demonstrates the effectiveness of TPE in optimizing hyper - parameters in deep reinforcement learning, providing more efficient and accurate solutions for complex robotic tasks.