LLM-Craft: Robotic Crafting of Elasto-Plastic Objects with Large Language Models

Alison Bartsch,Amir Barati Farimani
2024-10-01
Abstract:When humans create sculptures, we are able to reason about how geometrically we need to alter the clay state to reach our target goal. We are not computing point-wise similarity metrics, or reasoning about low-level positioning of our tools, but instead determining the higher-level changes that need to be made. In this work, we propose LLM-Craft, a novel pipeline that leverages large language models (LLMs) to iteratively reason about and generate deformation-based crafting action sequences. We simplify and couple the state and action representations to further encourage shape-based reasoning. To the best of our knowledge, LLM-Craft is the first system successfully leveraging LLMs for complex deformable object interactions. Through our experiments, we demonstrate that with the LLM-Craft framework, LLMs are able to successfully reason about the deformation behavior of elasto-plastic objects. Furthermore, we find that LLM-Craft is able to successfully create a set of simple letter shapes. Finally, we explore extending the framework to reaching more ambiguous semantic goals, such as "thinner" or "bumpy". For videos please see our website: <a class="link-external link-https" href="https://sites.google.com/andrew.cmu.edu/llmcraft" rel="external noopener nofollow">this https URL</a>.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use large - language models (LLMs) to achieve complex deformation operations on elastoplastic objects (such as clay), specifically through robots for sculpture creation. Traditional methods usually focus on low - level dynamic prediction or direct motion imitation. Although these methods can achieve certain goals, they lack high - level understanding and reasoning ability of material behavior. This paper proposes a new framework - LLM - Craft, which utilizes the powerful reasoning ability of large - language models. By simplifying state and action representations, LLM can perform higher - level geometric reasoning, thus generating effective sequences of deformation operations. This is not only to achieve specific shape goals, but also to explore how to use LLM to handle more ambiguous semantic goals, such as "thinner", "rougher", etc. The key contributions of the paper are: 1. Proposing the first system that successfully uses LLM to operate elastoplastic objects in the real world. 2. Exploring the reasoning ability of LLM at the semantic level and its help for sculpture tasks. 3. Proving that through carefully designed prompt engineering, LLM can successfully reason about complex interactions between robots and objects. In conclusion, this research aims to show the potential of LLM in handling complex tasks that require high - level understanding, especially for those tasks that require understanding of material behavior and long - term planning.