Abstract:The rapid progress of AI-powered programming assistants, such as GitHub Copilot, has facilitated the development of software applications. These assistants rely on large language models (LLMs), which are foundation models (FMs) that support a wide range of tasks related to understanding and generating language. LLMs have demonstrated their ability to express UML model specifications using formal languages like the Object Constraint Language (OCL). However, the context size of the prompt is limited by the number of tokens an LLM can process. This limitation becomes significant as the size of UML class models increases. In this study, we introduce PathOCL, a novel path-based prompt augmentation technique designed to facilitate OCL generation. PathOCL addresses the limitations of LLMs, specifically their token processing limit and the challenges posed by large UML class models. PathOCL is based on the concept of chunking, which selectively augments the prompts with a subset of UML classes relevant to the English specification. Our findings demonstrate that PathOCL, compared to augmenting the complete UML class model (UML-Augmentation), generates a higher number of valid and correct OCL constraints using the GPT-4 model. Moreover, the average prompt size crafted using PathOCL significantly decreases when scaling the size of the UML class models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficient ability of large - language models (LLMs) to generate effective Object Constraint Language (OCL) constraints when dealing with large - scale UML class models due to the limitations of their context windows. Specifically: 1. **Limitations of the context window**: Current large - language models can only process a limited number of tokens (text fragments). When the UML class model becomes very large, these models cannot handle the entire UML model, thus affecting the effectiveness of generating OCL constraints. 2. **Challenges of complex UML class models**: As the scale of UML class models increases, it becomes increasingly difficult to manually write and verify OCL constraints, especially for novice practitioners. In addition, different OCL constraints may have the same semantics, which further increases the difficulty of selecting the optimal OCL constraint. To solve these problems, the author proposes PathOCL, a path - based prompt - enhancement technique, which aims to generate OCL constraints by selectively enhancing a subset of UML classes related to English specifications. The main contributions of PathOCL include: - **Reducing the context size**: By including only a subset of UML classes related to English specifications, PathOCL significantly reduces the size of the prompt, enabling the LLM to more effectively handle large - scale UML class models. - **Improving the effectiveness and correctness of generation**: Experimental results show that, compared with full UML class model augmentation (UML - Augmentation), PathOCL generates more valid and correct OCL constraints. - **Reducing the inference cost**: PathOCL not only improves the generation quality but also reduces the inference cost, and shows better scalability especially when dealing with UML class models of different scales. In conclusion, PathOCL overcomes the bottlenecks encountered by existing methods in dealing with large - scale UML class models by optimizing the prompt structure, thereby improving the quality and efficiency of generating OCL constraints.

PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4

Task-oriented Prompt Enhancement via Script Generation

Structured Chain-of-Thought Prompting for Code Generation

AMPO: Automatic Multi-Branched Prompt Optimization

Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control

Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints

Strategic Optimization and Challenges of Large Language Models in Object-Oriented Programming

KnowGPT: Knowledge Graph based Prompting for Large Language Models

UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities

Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

OlaGPT: Empowering LLMs With Human-like Problem-Solving Abilities

Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning

Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

Autonomous Prompt Engineering in Large Language Models

Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models

Supervisory Prompt Training

AceCoder : An Effective Prompting Technique Specialized in Code Generation

Active Prompting with Chain-of-Thought for Large Language Models

Prompting Is Programming: A Query Language for Large Language Models