Abstract:In the past few years, Large Language Models (LLMs) have exploded in usefulness and popularity for code generation tasks. However, LLMs still struggle with accuracy and are unsuitable for high-risk applications without additional oversight and verification. In particular, they perform poorly at generating code for highly complex systems, especially with unusual or out-of-sample logic. For such systems, verifying the code generated by the LLM may take longer than writing it by hand. We introduce a solution that divides the code generation into two parts; one to be handled by an LLM and one to be handled by formal methods-based program synthesis. We develop a benchmark to test our solution and show that our method allows the pipeline to solve problems previously intractable for LLM code generation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to combine the code - generation ability of large - language models (LLMs) with program - synthesis techniques in formal methods to improve the correctness of code generation and reduce the amount of code that needs to be manually verified**. Specifically, the paper focuses on the accuracy and trust issues faced when using LLMs to generate code in high - risk applications, especially their poor performance when dealing with complex systems and non - conventional logic. ### Detailed Explanation 1. **Limitations of LLM Code Generation** - Although LLMs perform well in code - generation tasks, in high - risk applications, due to the lack of formal correctness guarantees, the code generated by LLMs still needs to be manually verified, which greatly reduces their advantages. - For highly complex systems, especially those with uncommon or out - of - sample logic, the verification time of the code generated by LLMs may exceed the time of manual writing. 2. **Combining Formal Methods** - The paper proposes a solution that divides code generation into two parts: one part is handled by LLMs, and the other part is handled by program - synthesis based on formal methods. - Specifically, the authors introduce **Temporal Stream Logic (TSL)**, a formal language that allows users to specify short logical constraints on system behavior. In this way, reactive systems can be generated whose complexity is beyond what is easily achievable by maintainers. 3. **Key Contributions** - Proposed a framework that combines program - synthesis with formal specifications and LLMs code generation to reduce the amount of generated code that needs to be verified. - Implemented a specific code - generation pipeline that uses TSL for code generation. - Evaluated the system on two reactive - program - synthesis benchmark datasets. 4. **Innovative Points** - Use TSL to generate code structures with "holes", which can be filled by LLMs later, thus ensuring structural correctness. - By separating data and control, use the flexibility of LLMs to generate function and predicate terms while maintaining the logical correctness of the system. ### Summary The main goal of the paper is to solve the accuracy and trust issues of LLMs when generating code in high - risk applications, especially when dealing with complex systems, by combining the flexibility of LLMs and the rigor of formal methods. This method not only improves the correctness of code but also reduces the amount of code that needs to be manually verified, thereby enhancing development efficiency and code quality.

Combining LLM Code Generation with Formal Specifications and Reactive Program Synthesis

Towards Automated Verification of LLM-Synthesized C Programs

Large Language Models Synergize with Automated Machine Learning

Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models

Guiding LLM Temporal Logic Generation with Explicit Separation of Data and Control

Towards Large Language Model Aided Program Refinement

Planning-Driven Programming: A Large Language Model Programming Workflow

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

Benchmarking Large Language Models for Automated Verilog RTL Code Generation

Fully Autonomous Programming with Large Language Models

A Multi-Expert Large Language Model Architecture for Verilog Code Generation

VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Are LLMs Any Good for High-Level Synthesis?

Guiding Enumerative Program Synthesis with Large Language Models

A Survey on Large Language Models for Code Generation

The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents