Toward Understanding Key Estimation in Learning Robust Humanoid Locomotion

Zhicheng Wang,Wandi Wei,Ruiqi Yu,Jun Wu,Qiuguo Zhu
2024-03-09
Abstract:Accurate state estimation plays a critical role in ensuring the robust control of humanoid robots, particularly in the context of learning-based control policies for legged robots. However, there is a notable gap in analytical research concerning estimations. Therefore, we endeavor to further understand how various types of estimations influence the decision-making processes of policies. In this paper, we provide quantitative insight into the effectiveness of learned state estimations, employing saliency analysis to identify key estimation variables and optimize their combination for humanoid locomotion tasks. Evaluations assessing tracking precision and robustness are conducted on comparative groups of policies with varying estimation combinations in both simulated and real-world environments. Results validated that the proposed policy is capable of crossing the sim-to-real gap and demonstrating superior performance relative to alternative policy configurations.
Robotics
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Understanding the role of key state estimation in reinforcement learning control strategies**: Researchers attempt to delve into how different types of estimations influence the decision-making process and evaluate the effectiveness of learned state estimations through quantitative analysis. 2. **Determining the optimal combination of estimation variables**: By identifying key estimation variables through significance analysis and optimizing their combination, the performance of humanoid robot walking tasks can be enhanced. 3. **Designing a highly adaptable learning framework**: Proposing a controllable and highly adaptable framework based on an asymmetric actor-critic structure for learning the walking capabilities of humanoid robots. Specifically, the researchers focus on the following aspects: - **Methodology**: Using an asymmetric actor-critic structure to train strategies, where the actor strategy can only access 0.5 seconds of historical observation data (including delayed and noisy proprioceptive information and commands), while the critic strategy can access all types of system states. - **State and action definitions**: States are divided into observations, privileged information, and commands, with detailed definitions of the specific content of each type of state. - **Reward design**: Introducing a bell-shaped kernel function into the reward design to encourage strategies to survive in complex environments. - **Significance analysis**: Utilizing the integrated gradients method from explainable artificial intelligence for significance analysis to quantify the importance of different estimation variables. - **Experimental setup and results**: Evaluating the effectiveness of different estimation strategies through a series of simulations and real-world experiments, verifying that strategies containing the most relevant estimation variables achieve the best overall performance. The main contributions of the paper include: - Conducting a quantitative analysis of how estimation variables affect the performance of learning strategies and proposing the optimal combination of estimation variables. - Proposing a controllable and highly adaptable humanoid robot walking learning framework based on an asymmetric actor-critic structure. - Testing the proposed framework and estimation methods in the real world and demonstrating their adaptability to outdoor environments. In summary, this paper aims to provide theoretical basis and technical support for enhancing the stability and adaptability of humanoid robots in complex environments through in-depth analysis and experimental evaluation.