ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification

Fangwen Mu,Lin Shi,Song Wang,Zhuohao Yu,Binquan Zhang,Chenxue Wang,Shichao Liu,Qing Wang
2023-10-17
Abstract:We introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. In particular, ClarifyGPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, ClarifyGPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, ClarifyGPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our ClarifyGPT, we first conduct a human evaluation involving ten participants who use ClarifyGPT for code generation on two publicly available benchmarks: MBPP-sanitized and MBPP-ET. The results show that ClarifyGPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to perform large-scale automated evaluations of ClarifyGPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The automated evaluation results also demonstrate that ClarifyGPT can significantly enhance code generation performance compared to the baselines. In particular, ClarifyGPT improves the average performance of GPT-4 and ChatGPT across four benchmarks from 68.02% to 75.75% and from 58.55% to 67.22%, respectively. We believe that ClarifyGPT can effectively facilitate the practical application of LLMs in real-world development environments.
Software Engineering
What problem does this paper attempt to address?
### The Problem Addressed by the Paper The paper aims to address the issue of ambiguous requirements encountered by large language models (LLMs) when generating code. Specifically: 1. **Current Problem**: Existing large language models like ChatGPT often generate code directly based on user-provided requirements, which may be ambiguous or incomplete, without further clarification. This can lead to generated code that does not align with the user's actual intent. 2. **Solution**: The researchers propose a new framework called ClarifyGPT, which can automatically identify ambiguous requirements and ask the user questions to obtain more precise information. In this way, ClarifyGPT ensures that the requirements are clear before generating code. 3. **Specific Methods**: - **Requirement Detection**: ClarifyGPT first determines whether the given requirement is ambiguous through code consistency checks. - **Questioning Mechanism**: If the requirement is deemed ambiguous, ClarifyGPT generates targeted clarification questions. - **Code Generation**: Based on user feedback, ClarifyGPT refines the requirements and generates the final code solution. 4. **Experimental Validation**: The researchers evaluated ClarifyGPT through human assessments and automated simulation methods. The results show that it significantly improves the quality and efficiency of code generation. For example, ClarifyGPT improved GPT-4's performance on the MBPP-sanitized dataset from 70.96% to 80.80%. In summary, the paper primarily addresses how to enable large language models to better understand user intent when faced with ambiguous requirements and generate code that meets those requirements.