Abstract:Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form navigation, manipulation, and mobile-manipulation commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot’s environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a “sensor”—inferring spatial, topological, and semantic information implicit in natural-language utterances and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot’s action space, consistent with the utterance. We use imitation learning to identify a belief-space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety of different navigation and mobile-manipulation experiments involving an unmanned ground vehicle, a robotic wheelchair, and a mobile manipulator, demonstrating that the algorithm can follow natural-language instructions without prior knowledge of the environment.

Identification of Unmodeled Objects from Symbolic Descriptions

Language-guided Adaptive Perception with Hierarchical Symbolic Representations for Mobile Manipulators

Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach

Discovering Predictive Relational Object Symbols With Symbolic Attentive Layers

Online Grounding of Symbolic Planning Domains in Unknown Environments

Symbol emergence as interpersonal cross-situational learning: the emergence of lexical knowledge with combinatoriality

Model-based recognition in robot vision for monitoring built environments

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

Language Understanding for Field and Service Robots in a Priori Unknown Environments

Knowledge-based multimodal information fusion for role recognition and situation assessment by using mobile robot

Combining top-down spatial reasoning and bottom-up object class recognition for scene understanding

Perception and Grasping of Object Parts from Active Robot Exploration

Transferring Implicit Knowledge of Non-Visual Object Properties Across Heterogeneous Robot Morphologies

Symbolic Manipulation Planning with Discovered Object and Relational Predicates

Self-Reflective Risk-Aware Artificial Cognitive Modeling for Robot Response to Human Behaviors

Symbol emergence in robotics: a survey

Learning Multi-Object Symbols for Manipulation with Attentive Deep Effect Predictors

A Universal Semantic-Geometric Representation for Robotic Manipulation

A neuro-symbolic approach for multimodal reference expression comprehension

Tell and show: Combining multiple modalities to communicate manipulation tasks to a robot

Represent and Infer Human Theory of Mind for Human-Robot Interaction.