Abstract:Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping).

What problem does this paper attempt to address?

The paper aims to address the complexity of robot programming, especially for users without professional programming knowledge. Specifically, the paper proposes a new framework that simplifies the programming process of collaborative robots through Natural Language Processing (NLP) and Augmented Reality (AR) technologies. The main issues the paper attempts to solve are as follows: 1. **Lowering the Programming Threshold**: Traditional robot programming requires users to master specific programming languages and understand the physical limitations of robots. This is a significant barrier for Small and Medium-sized Enterprises (SMEs) as they often lack this expertise. Therefore, the framework simplifies this process through natural language input. 2. **Natural Language Control**: Users can control robots through simple voice commands without memorizing complex command sets. This allows non-professional users to easily interact with robots. 3. **Augmented Reality Feedback**: The framework utilizes AR technology to display the robot's path planning results in real-time within the user's field of view, enabling users to intuitively check and confirm whether the robot's actions meet their expectations. 4. **Automatic Skill Generation**: In addition to basic path planning, the paper also explores how to use generative AI models to automatically generate the robot's expressive behaviors (such as nodding, shaking head, etc.) to enhance the naturalness and flexibility of human-robot interaction. Through these innovative methods, the paper hopes to reduce the complexity of robot programming, allowing more people without a professional background to easily use and control collaborative robots, thereby promoting the application of automation technology in a broader range of fields.

Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality

Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

Immersive Assistance System for Intuitive Robot Programming using Mixed-Reality and Digital Twin

Improving Human Legibility in Collaborative Robot Tasks through Augmented Reality and Workspace Preparation

Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Toward Programming a Collaborative Robot by Interacting with Its Digital Twin in a Mixed Reality Environment

Spatial Assisted Human-Drone Collaborative Navigation and Interaction through Immersive Mixed Reality

Enabling Intuitive Human-Robot Teaming Using Augmented Reality and Gesture Control

Learning robot motor skills with mixed reality

An LLM-based vision and language cobot navigation approach for Human-centric Smart Manufacturing

Automatic Robotic Development through Collaborative Framework by Large Language Models

LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks

VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots

A 3D Mixed Reality Interface for Human-Robot Teaming

Mixed Reality as Communication Medium for Human-Robot Collaboration

A Novel Robot Teaching System Based on Augmented Reality

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Grounding Language Models in Autonomous Loco-manipulation Tasks

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning