Abstract:Large language models have found utility in the domain of robot task planning and task decomposition. Nevertheless, the direct application of these models for instructing robots in task execution is not without its challenges. Limitations arise in handling more intricate tasks, encountering difficulties in effective interaction with the environment, and facing constraints in the practical executability of machine control instructions directly generated by such models. In response to these challenges, this research advocates for the implementation of a multi-layer large language model to augment a robot's proficiency in handling complex tasks. The proposed model facilitates a meticulous layer-by-layer decomposition of tasks through the integration of multiple large language models, with the overarching goal of enhancing the accuracy of task planning. Within the task decomposition process, a visual language model is introduced as a sensor for environment perception. The outcomes of this perception process are subsequently assimilated into the large language model, thereby amalgamating the task objectives with environmental information. This integration, in turn, results in the generation of robot motion planning tailored to the specific characteristics of the current environment. Furthermore, to enhance the executability of task planning outputs from the large language model, a semantic alignment method is introduced. This method aligns task planning descriptions with the functional requirements of robot motion, thereby refining the overall compatibility and coherence of the generated instructions. To validate the efficacy of the proposed approach, an experimental platform is established utilizing an intelligent unmanned vehicle. This platform serves as a means to empirically verify the proficiency of the multi-layer large language model in addressing the intricate challenges associated with both robot task planning and execution.

Decision-Making in Robotic Grasping with Large Language Models.

Large Language Models for Robotics: Opportunities, Challenges, and Perspectives

Large Language Models for Robotics: A Survey

Reasoning Grasping via Multimodal Large Language Model

Lan-grasp: Using Large Language Models for Semantic Object Grasping

Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

RT-Grasp: Reasoning Tuning Robotic Grasping via Multi-modal Large Language Model

A Smart Interactive Camera Robot Based on Large Language Models

Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model

Grounding Language Models in Autonomous Loco-manipulation Tasks

A Survey on Integration of Large Language Models with Intelligent Robots

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Interpreting and learning voice commands with a Large Language Model for a robot system

Statler: State-Maintaining Language Models for Embodied Reasoning

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks