Abstract:Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form navigation, manipulation, and mobile-manipulation commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot’s environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a “sensor”—inferring spatial, topological, and semantic information implicit in natural-language utterances and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot’s action space, consistent with the utterance. We use imitation learning to identify a belief-space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety of different navigation and mobile-manipulation experiments involving an unmanned ground vehicle, a robotic wheelchair, and a mobile manipulator, demonstrating that the algorithm can follow natural-language instructions without prior knowledge of the environment.

Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model

Grounding Language Models in Autonomous Loco-manipulation Tasks

One to rule them all: natural language to bind communication, perception and action

HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks

Towards Human Awareness in Robot Task Planning with Large Language Models

Large Language Models for Orchestrating Bimanual Robots

Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models

Versatile multicontact planning and control for legged loco-manipulation

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

Leveraging Large Language Models for Comprehensive Locomotion Control in Humanoid Robots Design

Non-Prehensile Tool-Object Manipulation by Integrating LLM-Based Planning and Manoeuvrability-Driven Controls

GG-LLM: Geometrically Grounding Large Language Models for Zero-shot Human Activity Forecasting in Human-Aware Task Planning

Versatile Multi-Contact Planning and Control for Legged Loco-Manipulation

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Speech-Guided Sequential Planning for Autonomous Navigation using Large Language Model Meta AI 3 (Llama3)

Lifelong Robot Learning with Human Assisted Language Planners

Language Understanding for Field and Service Robots in a Priori Unknown Environments