Abstract:Traversing risky terrains with sparse footholds presents significant challenges for legged robots, requiring precise foot placement in safe areas. Current learning-based methods often rely on implicit feature representations without supervising physically significant estimation targets. This limits the policy's ability to fully understand complex terrain structures, which is critical for generating accurate actions. In this paper, we utilize end-to-end reinforcement learning to traverse risky terrains with high sparsity and randomness. Our approach integrates proprioception with single-view depth images to reconstruct robot's local terrain, enabling a more comprehensive representation of terrain information. Meanwhile, by incorporating implicit and explicit estimations of the robot's state and its surroundings, we improve policy's environmental understanding, leading to more precise actions. We deploy the proposed framework on a low-cost quadrupedal robot, achieving agile and adaptive locomotion across various challenging terrains and demonstrating outstanding performance in real-world scenarios. Video at: <a class="link-external link-http" href="http://youtu.be/ReQAR4D6tuc" rel="external noopener nofollow">this http URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the problem of stable and fast traversal of quadruped robots on sparse footholds and complex terrains**. Specifically, such terrains include irregularly distributed stones, balance beams and gaps, etc., requiring the robot to be able to place its feet precisely to ensure safety.
### Problem Background and Challenges
1. **Tight Coupling between Environmental Perception and Motion Planning**:
- On complex terrains with sparse footholds, the robot needs to accurately perceive the environment and flexibly adjust its pace according to the perception results while maintaining balance and stability.
- Each step must be accurately placed within a safe area, which poses extremely high requirements for the robot's perception and control.
2. **Limitations of Existing Methods**:
- **Traditional Model - Based Methods**: Rely on manually designed models, which are suitable for specific scenarios but difficult to extend to more complex terrains.
- **End - to - End Reinforcement Learning Methods**: Although they can learn to directly map from vision and proprioception to actions, they perform poorly on sparse and highly random terrains. The main reason is the lack of sufficient understanding and representation of complex terrain structures.
3. **Partial Observability and Information Memory**:
- Due to the limitation of the robot's view, it cannot directly see all terrain information (such as the areas below and behind it), which makes the policy need to remember past observations and integrate them into a coherent terrain representation.
- Existing methods face challenges in dealing with this partial observability, resulting in difficulties in exploration and learning.
### The Method Proposed in the Paper
To solve the above problems, the paper proposes an end - to - end reinforcement learning - based framework, which mainly contains two key components:
1. **Locomotion Policy**:
- By combining proprioception and local heightmaps, it uses implicit - explicit estimation to infer the state of the robot and the surrounding environment, thereby comprehensively extracting environmental features and generating precise actions.
2. **Terrain Reconstructor**:
- It uses proprioception and egocentric depth images to accurately reconstruct local heightmaps, including the areas below and behind the robot. This provides an explicit physical supervision target for the locomotion policy, enhancing its understanding and representation of terrain features.
### Main Contributions
- Proposed a one - stage locomotion policy based on implicit - explicit estimation, which can comprehensively extract environmental features and generate precise actions.
- Designed a terrain reconstructor that can accurately reconstruct local heightmaps using only proprioception and egocentric depth images, providing an explicit physical supervision target and significantly improving the understanding and representation of terrain features.
- Successfully achieved zero - sample transfer from simulation to practical applications, demonstrating excellent adaptability and robustness on sparse footholds and complex terrains.
Through these improvements, the method in the paper can achieve agile and stable movement in various challenging real - world environments.