Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model

Jin Wang,Arturo Laurenzi,Nikos Tsagarakis
2024-08-16
Abstract:Enabling humanoid robots to perform autonomously loco-manipulation in unstructured environments is crucial and highly challenging for achieving embodied intelligence. This involves robots being able to plan their actions and behaviors in long-horizon tasks while using multi-modality to perceive deviations between task execution and high-level planning. Recently, large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information through robot control tasks, as well as the usability of analytical judgment and decision-making for multi-modal inputs. To leverage the power of LLMs towards humanoid loco-manipulation, we propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions, while observing and correcting failures that may occur during task execution. To systematically evaluate this framework in grounding LLMs, we created the robot 'action' and 'sensing' behavior library for task planning, and conducted mobile manipulation tasks and experiments in both simulated and real environments using the CENTAURO robot, and verified the effectiveness and application of this approach in robotic tasks with autonomous behavioral planning.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of how to enable humanoid robots to autonomously perform mobile manipulation tasks in unstructured environments. Specifically, the research focuses on: 1. **Autonomous Behavior Planning**: Developing a new framework based on language models that allows robots to autonomously plan their behavior and low-level execution according to given textual instructions, and to observe and correct potential errors during task execution. 2. **Multimodal Perception and Decision Making**: Combining multimodal perception capabilities (such as vision and force sensing) to enable robots to understand the discrepancies between task execution and high-level planning, and to make adjustments accordingly. 3. **Failure Detection and Recovery**: Proposing a mechanism to detect failures during task execution and achieve automatic recovery through predefined behavior graphs, thereby improving task success rates. Through these methods, the paper aims to validate the effectiveness of this framework in both simulated and real environments, particularly in tasks that require long time spans and complex planning, demonstrating the potential and practicality of humanoid robots (such as CENTAURO) in autonomous behavior planning.