Prompting open-source and commercial language models for grammatical error correction of English learner text

Christopher Davis,Andrew Caines,Øistein Andersen,Shiva Taslimipoor,Helen Yannakoudakis,Zheng Yuan,Christopher Bryant,Marek Rei,Paula Buttery
2024-01-15
Abstract:Thanks to recent advances in generative AI, we are able to prompt large language models (LLMs) to produce texts which are fluent and grammatical. In addition, it has been shown that we can elicit attempts at grammatical error correction (GEC) from LLMs when prompted with ungrammatical input sentences. We evaluate how well LLMs can perform at GEC by measuring their performance on established benchmark datasets. We go beyond previous studies, which only examined GPT* models on a selection of English GEC datasets, by evaluating seven open-source and three commercial LLMs on four established GEC benchmarks. We investigate model performance and report results against individual error types. Our results indicate that LLMs do not always outperform supervised English GEC models except in specific contexts -- namely commercial LLMs on benchmarks annotated with fluency corrections as opposed to minimal edits. We find that several open-source models outperform commercial ones on minimal edit benchmarks, and that in some settings zero-shot prompting is just as competitive as few-shot prompting.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of evaluating the performance of large language models (LLMs) in the task of English grammatical error correction (GEC) and comparing it with existing supervised learning GEC models. Specifically, the researchers focus on the following points: 1. **Scope of Evaluation**: The study is not limited to the previous examinations of the GPT series models on some English GEC datasets but extends to evaluate seven open-source and three commercial LLMs on four established GEC benchmarks. 2. **Evaluation Method**: The study guides LLMs to perform minimal edit style corrections through zero-shot and few-shot prompting. This style of correction aims to retain the original expression and word choice, correcting only grammatical errors rather than rewriting the text for fluency. 3. **Performance Comparison**: The researchers pay particular attention to the performance of LLMs on different types of errors and compare the results with the standards of individual error types to assess whether LLMs can surpass supervised learning GEC models in specific contexts. 4. **Educational Applications**: The paper also explores the value of these models in the educational field, particularly how they can assist second language learners in English writing through means such as instant feedback, automatic grading, and personalized learning. In summary, the core issue of the paper is to evaluate the effectiveness and applicability of LLMs in the task of English grammatical error correction, especially their potential in educational technology applications.