PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4

Seif Abukhalaf,Mohammad Hamdaqa,Foutse Khomh
2024-06-07
Abstract:The rapid progress of AI-powered programming assistants, such as GitHub Copilot, has facilitated the development of software applications. These assistants rely on large language models (LLMs), which are foundation models (FMs) that support a wide range of tasks related to understanding and generating language. LLMs have demonstrated their ability to express UML model specifications using formal languages like the Object Constraint Language (OCL). However, the context size of the prompt is limited by the number of tokens an LLM can process. This limitation becomes significant as the size of UML class models increases. In this study, we introduce PathOCL, a novel path-based prompt augmentation technique designed to facilitate OCL generation. PathOCL addresses the limitations of LLMs, specifically their token processing limit and the challenges posed by large UML class models. PathOCL is based on the concept of chunking, which selectively augments the prompts with a subset of UML classes relevant to the English specification. Our findings demonstrate that PathOCL, compared to augmenting the complete UML class model (UML-Augmentation), generates a higher number of valid and correct OCL constraints using the GPT-4 model. Moreover, the average prompt size crafted using PathOCL significantly decreases when scaling the size of the UML class models.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient ability of large - language models (LLMs) to generate effective Object Constraint Language (OCL) constraints when dealing with large - scale UML class models due to the limitations of their context windows. Specifically: 1. **Limitations of the context window**: Current large - language models can only process a limited number of tokens (text fragments). When the UML class model becomes very large, these models cannot handle the entire UML model, thus affecting the effectiveness of generating OCL constraints. 2. **Challenges of complex UML class models**: As the scale of UML class models increases, it becomes increasingly difficult to manually write and verify OCL constraints, especially for novice practitioners. In addition, different OCL constraints may have the same semantics, which further increases the difficulty of selecting the optimal OCL constraint. To solve these problems, the author proposes PathOCL, a path - based prompt - enhancement technique, which aims to generate OCL constraints by selectively enhancing a subset of UML classes related to English specifications. The main contributions of PathOCL include: - **Reducing the context size**: By including only a subset of UML classes related to English specifications, PathOCL significantly reduces the size of the prompt, enabling the LLM to more effectively handle large - scale UML class models. - **Improving the effectiveness and correctness of generation**: Experimental results show that, compared with full UML class model augmentation (UML - Augmentation), PathOCL generates more valid and correct OCL constraints. - **Reducing the inference cost**: PathOCL not only improves the generation quality but also reduces the inference cost, and shows better scalability especially when dealing with UML class models of different scales. In conclusion, PathOCL overcomes the bottlenecks encountered by existing methods in dealing with large - scale UML class models by optimizing the prompt structure, thereby improving the quality and efficiency of generating OCL constraints.