Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction

Masahiro Kaneko,Naoaki Okazaki
2023-09-21
Abstract:In Grammatical Error Correction (GEC), it is crucial to ensure the user's comprehension of a reason for correction. Existing studies present tokens, examples, and hints as to the basis for correction but do not directly explain the reasons for corrections. Although methods that use Large Language Models (LLMs) to provide direct explanations in natural language have been proposed for various tasks, no such method exists for GEC. Generating explanations for GEC corrections involves aligning input and output tokens, identifying correction points, and presenting corresponding explanations consistently. However, it is not straightforward to specify a complex format to generate explanations, because explicit control of generation is difficult with prompts. This study introduces a method called controlled generation with Prompt Insertion (PI) so that LLMs can explain the reasons for corrections in natural language. In PI, LLMs first correct the input text, and then we automatically extract the correction points based on the rules. The extracted correction points are sequentially inserted into the LLM's explanation output as prompts, guiding the LLMs to generate explanations for the correction points. We also create an Explainable GEC (XGEC) dataset of correction reasons by annotating NUCLE, CoNLL2013, and CoNLL2014. Although generations from GPT-3 and ChatGPT using original prompts miss some correction points, the generation control using PI can explicitly guide to describe explanations for all correction points, contributing to improved performance in generating correction reasons.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of how to generate natural language explanations in the task of grammatical error correction (GEC) so that users can understand the reasons for the corrections. Existing research, although providing some correction bases based on markers, examples, and prompts, does not directly explain the specific reasons for the corrections. Although large language models (LLMs) have demonstrated the ability to generate natural language explanations in other tasks, there is no method yet that can effectively generate these explanations in the GEC task. Specifically, the paper proposes a method called "Controlled Generation with Prompt Insertion (PI)," which guides LLMs to generate detailed explanations for each correction point by inserting prompts during the generation process. This method not only improves the accuracy and coverage of the generated explanations but also creates a new dataset, XGEC, for evaluating and training models that generate natural language explanations. In summary, the main contributions of the paper include: 1. **Proposing a new method**: The PI method controls LLMs to generate natural language explanations by inserting prompts, ensuring that all correction points have corresponding explanations. 2. **Creating a new dataset**: The XGEC dataset contains a large number of erroneous texts, correct texts, and natural language explanations for each correction point, providing rich resources for research. 3. **Experimental validation of the method's effectiveness**: Through experimental comparisons, it is proven that the PI method performs better than existing methods in generating natural language explanations.