Agile Methodology for the Standardization of Engineering Requirements Using Large Language Models

Archana Tikayat Ray,Bjorn F. Cole,Olivia J. Pinon Fischer,Anirudh Prabhakara Bhat,Ryan T. White,Dimitri N. Mavris
DOI: https://doi.org/10.3390/systems11070352
2023-07-10
Systems
Abstract:The increased complexity of modern systems is calling for an integrated and comprehensive approach to system design and development and, in particular, a shift toward Model-Based Systems Engineering (MBSE) approaches for system design. The requirements that serve as the foundation for these intricate systems are still primarily expressed in Natural Language (NL), which can contain ambiguities and inconsistencies and suffer from a lack of structure that hinders their direct translation into models. The colossal developments in the field of Natural Language Processing (NLP), in general, and Large Language Models (LLMs), in particular, can serve as an enabler for the conversion of NL requirements into machine-readable requirements. Doing so is expected to facilitate their standardization and use in a model-based environment. This paper discusses a two-fold strategy for converting NL requirements into machine-readable requirements using language models. The first approach involves creating a requirements table by extracting information from free-form NL requirements. The second approach consists of an agile methodology that facilitates the identification of boilerplate templates for different types of requirements based on observed linguistic patterns. For this study, three different LLMs are utilized. Two of these models are fine-tuned versions of Bidirectional Encoder Representations from Transformers (BERTs), specifically, aeroBERT-NER and aeroBERT-Classifier, which are trained on annotated aerospace corpora. Another LLM, called flair/chunk-english, is utilized to identify sentence chunks present in NL requirements. All three language models are utilized together to achieve the standardization of requirements. The effectiveness of the methodologies is demonstrated through the semi-automated creation of boilerplates for requirements from Parts 23 and 25 of Title 14 Code of Federal Regulations (CFRs).
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of standardizing natural language (NL) requirements in modern system design and development. Specifically, it attempts to convert natural language requirements into machine - readable requirements by using large language models (LLMs) to achieve the following goals: 1. **Reduce ambiguity and inconsistency**: Requirements expressed in natural language often contain vagueness and inconsistency, which may lead to errors or misunderstandings during the development of complex systems. By standardizing these requirements, their accuracy and consistency can be improved. 2. **Accelerate the verification process**: Converting natural language requirements into a machine - readable format allows for earlier verification of requirements in the early stages of development, thus saving time and cost and avoiding costly rework or project abandonment in the later stages. 3. **Promote automated integration**: By transforming natural language requirements into structured tables or templates, it can be more easily integrated into a model - based systems engineering (MBSE) environment to support tasks such as system design, architecture, implementation, and testing. 4. **Improve requirement quality**: By identifying and applying standard templates (boilerplates), the quality of requirements can be ensured, vagueness can be reduced, readability can be enhanced, and understanding consistency among different stakeholders can be ensured. To this end, the paper proposes a two - step strategy: - **First step**: Create a requirements table by extracting information from free - form natural language requirements. - **Second step**: Adopt an agile method to identify different types of templates (boilerplate templates) based on observed language patterns. The paper uses three different large language models (LLMs) to achieve this goal: - **aeroBERT - NER**: Used to identify named entities. - **aeroBERT - Classifier**: Used to classify requirement types. - **flair/chunk - english**: Used to identify sentence fragments. These models work together to standardize requirements, thereby better supporting the development and verification processes of modern complex systems.