Abstract:The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software development lifecycle. We assess the performance of GPT-4 and CodeLlama in drafting an SRS for a university club management system and compare it against human benchmarks using eight distinct criteria. Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS, delivering complete and consistent drafts. We also evaluate the capabilities of LLMs to identify and rectify problems in a given requirements document. Our experiments indicate that GPT-4 is capable of identifying issues and giving constructive feedback for rectifying them, while CodeLlama's results for validation were not as encouraging. We repeated the generation exercise for four distinct use cases to study the time saved by employing LLMs for SRS generation. The experiment demonstrates that LLMs may facilitate a significant reduction in development time for entry-level software engineers. Hence, we conclude that the LLMs can be gainfully used by software engineers to increase productivity by saving time and effort in generating, validating and rectifying software requirements.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to explore the capabilities of large language models (LLMs) in generating software requirements specification (SRS) documents. Specifically, the researchers evaluated the performance of GPT-4 and CodeLlama in the following aspects: 1. **Generating high-quality SRS documents**: - Researchers used natural language prompts to have these models generate a requirements specification document for a university club management system and compared it with a baseline document generated by human engineers. - Evaluation metrics included the completeness, consistency, non-redundancy, conciseness, and level of detail of the documents. 2. **Validating and correcting requirements specifications**: - Researchers also tested the models' ability to identify and correct issues in existing requirements documents. - Evaluation metrics included the clarity, understandability, correctness, and verifiability of the requirements. 3. **Saving development time**: - Through multiple experiments, researchers assessed the time savings achieved by using LLMs to generate SRS documents compared to traditional manual methods. ### Main Research Questions 1. **RQ1: How do GPT-4 and CodeLlama perform in generating SRS documents compared to junior software engineers?** 2. **RQ2: How do GPT-4 and CodeLlama perform in validating the quality of requirements and suggesting improvements?** 3. **RQ3: How much workload can be reduced by using LLMs to generate SRS documents?** ### Experimental Design - **Task Definition**: Generate an SRS document for a university student club management portal, involving different roles such as administrators, student council coordinators, club coordinators, and students. - **Baseline Document**: An SRS document generated by human experts and compliant with IEEE standards. - **Model Generation**: Use GPT-4 and CodeLlama to generate SRS documents, providing detailed context and prompts. - **Evaluation Strategy**: Anonymously share the generated SRS documents with independent reviewers for scoring based on multiple metrics. ### Experimental Results - **Document Quality**: CodeLlama-generated documents were generally more detailed and comprehensive but sometimes overly verbose. ChatGPT-generated documents were more concise but lacked detail in some parts. - **Validation and Correction**: GPT-4 performed well in validating the quality of requirements and suggesting improvements, while CodeLlama's results were less satisfactory. - **Time Savings**: Using LLMs significantly reduced the time and effort required by junior software engineers in generating, validating, and correcting requirements specifications. ### Conclusion The study indicates that LLMs can play a significant role in generating and validating software requirements specifications, enhancing the efficiency and productivity of software development. However, there is still room for improvement, particularly in generating detailed and consistent documents.

Using LLMs in Software Requirements Specifications: An Empirical Evaluation

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project

"Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students

Requirements are All You Need: From Requirements to Code with LLMs

State of Practice: LLMs in Software Engineering and Software Architecture

How LLMs Aid in UML Modeling: An Exploratory Study with Novice Analysts

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Analyzing LLM Usage in an Advanced Computing Class in India

LLMs for science: Usage for code generation and data analysis

Leveraging LLMs for the Quality Assurance of Software Requirements

Impact of Large Language Models on Generating Software Specifications

LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering

Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

LLMs as Evaluators: A Novel Approach to Evaluate Bug Report Summarization

Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing

Towards Evaluation Guidelines for Empirical Studies involving LLMs

The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews

The emergence of Large Language Models (LLM) as a tool in literature reviews: an LLM automated systematic review

The Potential of LLMs in Automating Software Testing: From Generation to Reporting

Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques