Combining LLM Code Generation with Formal Specifications and Reactive Program Synthesis

William Murphy,Nikolaus Holzer,Feitong Qiao,Leyi Cui,Raven Rothkopf,Nathan Koenig,Mark Santolucito
2024-09-18
Abstract:In the past few years, Large Language Models (LLMs) have exploded in usefulness and popularity for code generation tasks. However, LLMs still struggle with accuracy and are unsuitable for high-risk applications without additional oversight and verification. In particular, they perform poorly at generating code for highly complex systems, especially with unusual or out-of-sample logic. For such systems, verifying the code generated by the LLM may take longer than writing it by hand. We introduce a solution that divides the code generation into two parts; one to be handled by an LLM and one to be handled by formal methods-based program synthesis. We develop a benchmark to test our solution and show that our method allows the pipeline to solve problems previously intractable for LLM code generation.
Software Engineering,Machine Learning,Logic in Computer Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to combine the code - generation ability of large - language models (LLMs) with program - synthesis techniques in formal methods to improve the correctness of code generation and reduce the amount of code that needs to be manually verified**. Specifically, the paper focuses on the accuracy and trust issues faced when using LLMs to generate code in high - risk applications, especially their poor performance when dealing with complex systems and non - conventional logic. ### Detailed Explanation 1. **Limitations of LLM Code Generation** - Although LLMs perform well in code - generation tasks, in high - risk applications, due to the lack of formal correctness guarantees, the code generated by LLMs still needs to be manually verified, which greatly reduces their advantages. - For highly complex systems, especially those with uncommon or out - of - sample logic, the verification time of the code generated by LLMs may exceed the time of manual writing. 2. **Combining Formal Methods** - The paper proposes a solution that divides code generation into two parts: one part is handled by LLMs, and the other part is handled by program - synthesis based on formal methods. - Specifically, the authors introduce **Temporal Stream Logic (TSL)**, a formal language that allows users to specify short logical constraints on system behavior. In this way, reactive systems can be generated whose complexity is beyond what is easily achievable by maintainers. 3. **Key Contributions** - Proposed a framework that combines program - synthesis with formal specifications and LLMs code generation to reduce the amount of generated code that needs to be verified. - Implemented a specific code - generation pipeline that uses TSL for code generation. - Evaluated the system on two reactive - program - synthesis benchmark datasets. 4. **Innovative Points** - Use TSL to generate code structures with "holes", which can be filled by LLMs later, thus ensuring structural correctness. - By separating data and control, use the flexibility of LLMs to generate function and predicate terms while maintaining the logical correctness of the system. ### Summary The main goal of the paper is to solve the accuracy and trust issues of LLMs when generating code in high - risk applications, especially when dealing with complex systems, by combining the flexibility of LLMs and the rigor of formal methods. This method not only improves the correctness of code but also reduces the amount of code that needs to be manually verified, thereby enhancing development efficiency and code quality.