MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBench

Moritz Meser,Aditya Bhatt,Boris Belousov,Jan Peters
2024-08-01
Abstract:We tackle the recently introduced benchmark for whole-body humanoid control HumanoidBench using MuJoCo MPC. We find that sparse reward functions of HumanoidBench yield undesirable and unrealistic behaviors when optimized; therefore, we propose a set of regularization terms that stabilize the robot behavior across tasks. Current evaluations on a subset of tasks demonstrate that our proposed reward function allows achieving the highest HumanoidBench scores while maintaining realistic posture and smooth control signals. Our code is publicly available and will become a part of MuJoCo MPC, enabling rapid prototyping of robot behaviors.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main objective of this paper is to address the challenging issues in humanoid robot control, particularly by improving the robot's performance in simulated environments using the Model Predictive Control (MPC) method. Specifically, the authors studied the recently proposed benchmark suite for holistic humanoid control, HumanoidBench, and found that the original reward function resulted in suboptimal and unrealistic behaviors during optimization. Therefore, the paper proposes a new reward function that stabilizes the robot's behavior by adding a series of regularization terms and validates the effectiveness of this approach across multiple tasks. Experimental results show that the newly designed reward function not only improves the HumanoidBench score but also maintains more natural postures and smoother control signals. Additionally, the authors discuss issues in the evaluation protocol, such as behavior instability caused by overly short episode lengths, and suggest using longer episode lengths and varying targets to repeat tasks for more reliable results. Overall, this paper aims to develop more efficient and stable control strategies for humanoid robots by improving existing MPC methods.