Abstract:Robots are increasingly being used in dynamic environments like workplaces, hospitals, and homes. As a result, interactions with robots must be simple and intuitive, with robots perception adapting efficiently to human-induced changes. This paper presents a robot control architecture that addresses key challenges in human-robot interaction, with a particular focus on the dynamic creation and continuous update of the robot state representation. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands, robotic skills representation, real-time dynamic semantic mapping of the perceived scene. This enables flexible and adaptive robotic behavior in complex, dynamic environments. Traditional robotic systems often rely on static, pre-programmed instructions and settings, limiting their adaptability to dynamic environments and real-time collaboration. In contrast, this architecture uses LLMs to interpret complex, high-level instructions and generate actionable plans that enhance human-robot collaboration. At its core, the system Perception Module generates and continuously updates a semantic scene graph using RGB-D sensor data, providing a detailed and structured representation of the environment. A particle filter is employed to ensure accurate object localization in dynamic, real-world settings. The Planner Module leverages this up-to-date semantic map to break down high-level tasks into sub-tasks and link them to robotic skills such as navigation, object manipulation (e.g., PICK and PLACE), and movement (e.g., GOTO). By combining real-time perception, state tracking, and LLM-driven communication and task planning, the architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.

Adaptive and transparent decision-making in autonomous robots through graph-structured world models

Decision-Making in Robotic Grasping with Large Language Models.

Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning on Graphs

LLM A: Human in the Loop Large Language Models Enabled A Search for Robotics

Large Language Model As Autonomous Decision Maker

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

REAL: Resilience and Adaptation using Large Language Models on Autonomous Aerial Robots

Grounding Language Models in Autonomous Loco-manipulation Tasks

Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy

Evaluating World Models with LLM for Decision Making

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation

Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning

Learning adaptive planning representations with natural language guidance

Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

Towards Human Awareness in Robot Task Planning with Large Language Models

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents

ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs