Abstract:This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from MIT Cheetah. The system employs an Intel RealSense D435i depth camera for depth vision-based navigation, which enables high-fidelity 3D environmental mapping and real-time path planning. A significant innovation is the customization of the EGO-Planner to optimize trajectory planning in dynamically changing terrains, coupled with the implementation of a multi-body dynamics model that significantly improves the robot's stability and maneuverability across various surfaces. The experimental results show that the RGB-D system exhibits superior velocity stability and trajectory accuracy to the SLAM system, with a 20% reduction in the cumulative velocity error and a 10% improvement in path tracking precision. The experimental results also show that the RGB-D system achieves smoother navigation, requiring 15% fewer iterations for path planning, and a 30% faster success rate recovery in challenging environments. The successful application of these technologies in simulated urban disaster scenarios suggests promising future applications in emergency response and complex urban environments. Part two of this paper presents the development of a robust path planning algorithm for a robot dog on a rough terrain based on attached binocular vision navigation. We use a commercial-of-the-shelf (COTS) robot dog. An optical CCD binocular vision dynamic tracking system is used to provide environment information. Likewise, the pose and posture of the robot dog are obtained from the robot's own sensors, and a kinematics model is established. Then, a binocular vision tracking method is developed to determine the optimal path, provide a proposal (commands to actuators) of the position and posture of the bionic robot, and achieve stable motion on tough terrains. The terrain is assumed to be a gentle uneven terrain to begin with and subsequently proceeds to a more rough surface. This work consists of four steps: (1) pose and position data are acquired from the robot dog's own inertial sensors, (2) terrain and environment information is input from onboard cameras, (3) information is fused (integrated), and (4) path planning and motion control proposals are made. Ultimately, this work provides a robust framework for future developments in the vision-based navigation and control of quadruped robots, offering potential solutions for navigating complex and dynamic terrains.

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Development of a Human-Robot Hybrid Intelligent System Based on Brain Teleoperation and Deep Learning SLAM

CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Decision-Making in Robotic Grasping with Large Language Models.

DogSurf: Quadruped Robot Capable of GRU-based Surface Recognition for Blind Person Navigation

LaMI: Large Language Models for Multi-Modal Human-Robot Interaction

Large Language Models for Robotics: Opportunities, Challenges, and Perspectives

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Quadrupedal Robotic Guide Dog with Vocal Human-Robot Interaction

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Path Planning and Motion Control of Robot Dog Through Rough Terrain Based on Vision Navigation

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models

A Smart Interactive Camera Robot Based on Large Language Models

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Open-World Object Manipulation using Pre-trained Vision-Language Models

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model