nl2spec: Interactively Translating Unstructured Natural Language to Temporal Logics with Large Language Models

Matthias Cosler,Christopher Hahn,Daniel Mendoza,Frederik Schmitt,Caroline Trippel
2023-03-09
Abstract:A rigorous formalization of desired system requirements is indispensable when performing any verification task. This often limits the application of verification techniques, as writing formal specifications is an error-prone and time-consuming manual task. To facilitate this, we present nl2spec, a framework for applying Large Language Models (LLMs) to derive formal specifications (in temporal logics) from unstructured natural language. In particular, we introduce a new methodology to detect and resolve the inherent ambiguity of system requirements in natural language: we utilize LLMs to map subformulas of the formalization back to the corresponding natural language fragments of the input. Users iteratively add, delete, and edit these sub-translations to amend erroneous formalizations, which is easier than manually redrafting the entire formalization. The framework is agnostic to specific application domains and can be extended to similar specification languages and new neural models. We perform a user study to obtain a challenging dataset, which we use to run experiments on the quality of translations. We provide an open-source implementation, including a web-based frontend.
Logic in Computer Science,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is how to effectively translate natural language specifications into formal temporal logic (such as LTL). Specifically, the authors propose a new framework called nl2spec, which aims to leverage large language models (LLMs) to automatically derive formal system requirement specifications from unstructured natural language descriptions. A key innovation of this framework is its ability to detect and resolve inherent ambiguities in natural language system requirements. By decomposing natural language input into sub-translations and allowing users to iteratively add, delete, or edit these sub-translations to correct erroneous formal results, the entire process becomes more efficient and user-friendly. Additionally, the framework is generalizable to specific application domains and specification languages and can be extended to new neural models. To validate its effectiveness, the authors conducted a user study to obtain a challenging dataset and performed experimental evaluations based on it. Ultimately, nl2spec demonstrated high efficiency and accuracy in handling unstructured and ambiguous natural language.