Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models

Angelos Mavrogiannis,Christoforos Mavrogiannis,Yiannis Aloimonos
2024-03-07
Abstract:Cooking recipes are challenging to translate to robot plans as they feature rich linguistic complexity, temporally-extended interconnected tasks, and an almost infinite space of possible actions. Our key insight is that combining a source of cooking domain knowledge with a formalism that captures the temporal richness of cooking recipes could enable the extraction of unambiguous, robot-executable plans. In this work, we use Linear Temporal Logic (LTL) as a formal language expressive enough to model the temporal nature of cooking recipes. Leveraging a pretrained Large Language Model (LLM), we present Cook2LTL, a system that translates instruction steps from an arbitrary cooking recipe found on the internet to a set of LTL formulae, grounding high-level cooking actions to a set of primitive actions that are executable by a manipulator in a kitchen environment. Cook2LTL makes use of a caching scheme that dynamically builds a queryable action library at runtime. We instantiate Cook2LTL in a realistic simulation environment (AI2-THOR), and evaluate its performance across a series of cooking recipes. We demonstrate that our system significantly decreases LLM API calls (-51%), latency (-59%), and cost (-42%) compared to a baseline that queries the LLM for every newly encountered action at runtime.
Robotics
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the challenge of converting cooking recipes into executable plans for robots. Specifically, cooking recipes often contain rich linguistic complexity, temporally interrelated tasks, and an almost infinite action space, making it very difficult to directly translate recipes into robotic action instructions. The main contribution of the paper is the proposal of a system called **Cook2LTL**, which can convert any cooking recipe from the internet into Linear Temporal Logic (LTL) formulas, thereby generating unambiguous and executable robot task specifications. ### Key Issues 1. **Linguistic Complexity**: The language in cooking recipes is often ambiguous and contains a lot of implicit information, such as omitted arguments that can be inferred from context (zero anaphora problem) or assumptions that the reader has the necessary background knowledge to fill in missing steps. 2. **Temporal Relevance**: Cooking tasks usually have an inherent temporal order, but recipes often include explicit sequential language (e.g., "until," "before") that needs to be correctly parsed to ensure the correct execution of tasks. 3. **Action Diversity**: Cooking tasks involve a wide variety of actions, and many high-level actions need to be decomposed into a series of basic actions to be executed by robots. ### Solutions To address the above challenges, the paper proposes the following solutions: 1. **Using Linear Temporal Logic (LTL)**: LTL is a formal language that can effectively express the temporal characteristics of cooking tasks, thereby generating unambiguous robot task specifications. 2. **Combining Large Language Models (LLM)**: By using pre-trained large language models, high-level cooking actions are decomposed into basic actions executable by robots, and corresponding LTL formulas are generated. 3. **Dynamically Building an Action Library**: Through a caching mechanism, the system gradually builds a queryable action library during runtime, reducing the number of LLM calls, improving system efficiency, and lowering costs. ### Experimental Validation The paper conducted experiments in the AI2-THOR simulation environment to validate the effectiveness of the Cook2LTL system. The experimental results show that compared to baseline methods, the Cook2LTL system significantly reduces the number of LLM API calls (by 51%), latency (by 59%), and cost (by 42%). Additionally, the paper demonstrates the ability of Cook2LTL to perform simple cooking tasks in a simulated kitchen, further proving its potential in practical applications. ### Conclusion By combining LTL and LLM, the Cook2LTL system successfully addresses the challenge of converting cooking recipes into executable plans for robots, providing new ideas and technical support for the application of household robots in daily cooking tasks.