Abstract:Despite the significant strides made by generative AI in just a few short years, its future progress is constrained by the challenge of building modular and robust systems. This capability has been a cornerstone of past technological revolutions, which relied on combining components to create increasingly sophisticated and reliable systems. Cars, airplanes, computers, and software consist of components-such as engines, wheels, CPUs, and libraries-that can be assembled, debugged, and replaced. A key tool for building such reliable and modular systems is specification: the precise description of the expected behavior, inputs, and outputs of each component. However, the generality of LLMs and the inherent ambiguity of natural language make defining specifications for LLM-based components (e.g., agents) both a challenging and urgent problem. In this paper, we discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute-and outline several future directions for research to enable the development of modular and reliable LLM-based systems through improved specifications.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: how to make the system development based on large language models (LLMs) an engineering discipline through well - defined specifications, so as to build modular and reliable LLM systems. Specifically, the paper focuses on the following aspects: 1. **Lack of clear specifications**: - The ambiguity of LLM systems and the inherent ambiguity of natural language make it difficult to define clear task specifications. For example, users specify tasks through natural language prompts, but these prompts are often vague and ill - defined, leading to the model generating wrong or unexpected results (such as the "hallucination" phenomenon). 2. **Building reliable and modular systems**: - Traditional engineering disciplines (such as mechanical engineering, software engineering) rely on modular design and the combination of components to create complex and reliable systems. However, most current LLM systems are monolithic and difficult to be modularly designed and debugged. This limits the reliability and extensibility of LLM systems. 3. **The ability of automated decision - making**: - A reliable system needs to be able to make decisions automatically without human intervention. This is crucial for many practical application scenarios. However, due to the lack of clear specifications, current LLM systems often rely on human evaluation of their output quality when performing tasks and cannot achieve full automation. 4. **Limitations of existing solutions**: - Although there are already some methods (such as structured output, process supervision, computation at test time, etc.) to improve the performance of LLM systems, these methods are still insufficient to deal with complex real - world tasks. Therefore, further research and development of new methods are needed to improve the quality of task specifications. ### Overview of the solution The paper proposes that by introducing explicit **statement specifications** and **solution specifications**, the reliability and modularity characteristics of LLM systems can be significantly improved. Specifically: - **Statement specifications** describe what a task should do, that is, the goal and expected behavior of the task. - **Solution specifications** describe how to verify whether the solution of a task complies with the statement specification. Through the combination of these two specifications, it can be ensured that LLM systems have higher accuracy and reliability when performing tasks, and can be more easily debugged and improved. In addition, clear specifications can also promote the interoperability and reusability between different components, thus promoting the development of LLM systems in a more modular and engineering - oriented direction. ### Future research directions The paper also points out some future research directions, including but not limited to: - Develop more powerful tools and techniques to help users write and verify task specifications. - Research how to apply existing software engineering practices to the design and development of LLM systems. - Explore new methods to reduce ambiguity in task specifications, especially when dealing with natural language input. In short, this paper aims to provide theoretical basis and technical support for building reliable, modular and automatable LLM systems by introducing clear specifications, thus promoting the development and application of LLM technology.

Specifications: The missing link to making the development of LLM systems an engineering discipline

SpecLLM: Exploring Generation and Review of VLSI Design Specification with Large Language Model

Impact of Large Language Models on Generating Software Specifications

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models

Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software

Practical Considerations for Agentic LLM Systems

Requirements are All You Need: From Requirements to Code with LLMs

Formally Specifying the High-Level Behavior of LLM-Based Agents

On the Exploration of LM-Based Soft Modular Robot Design

Guiding LLM Temporal Logic Generation with Explicit Separation of Data and Control

An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture

Easy Problems That LLMs Get Wrong

New Solutions on LLM Acceleration, Optimization, and Application

Beyond LLMs: Advancing the Landscape of Complex Reasoning

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework

AssertLLM: Generating Hardware Verification Assertions from Design Specifications via Multi-LLMs

AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

LLMs for science: Usage for code generation and data analysis