Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality

Cathy Mengying Fang,Krzysztof Zieliński,Pattie Maes,Joe Paradiso,Bruce Blumberg,Mikkel Baun Kjærgaard
2024-07-17
Abstract:Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping).
Human-Computer Interaction,Robotics
What problem does this paper attempt to address?
The paper aims to address the complexity of robot programming, especially for users without professional programming knowledge. Specifically, the paper proposes a new framework that simplifies the programming process of collaborative robots through Natural Language Processing (NLP) and Augmented Reality (AR) technologies. The main issues the paper attempts to solve are as follows: 1. **Lowering the Programming Threshold**: Traditional robot programming requires users to master specific programming languages and understand the physical limitations of robots. This is a significant barrier for Small and Medium-sized Enterprises (SMEs) as they often lack this expertise. Therefore, the framework simplifies this process through natural language input. 2. **Natural Language Control**: Users can control robots through simple voice commands without memorizing complex command sets. This allows non-professional users to easily interact with robots. 3. **Augmented Reality Feedback**: The framework utilizes AR technology to display the robot's path planning results in real-time within the user's field of view, enabling users to intuitively check and confirm whether the robot's actions meet their expectations. 4. **Automatic Skill Generation**: In addition to basic path planning, the paper also explores how to use generative AI models to automatically generate the robot's expressive behaviors (such as nodding, shaking head, etc.) to enhance the naturalness and flexibility of human-robot interaction. Through these innovative methods, the paper hopes to reduce the complexity of robot programming, allowing more people without a professional background to easily use and control collaborative robots, thereby promoting the application of automation technology in a broader range of fields.