Abstract:Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potential for native Chinese grammatical error correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of ChatGPT-generated and human-annotated. For grammatical errors with clues, we proposed a heuristic method to guide ChatGPT to generate ungrammatical sentences by providing those clues. For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them. In addition, we employed an error-invariant augmentation method to enhance the ability of the model to correct native Chinese grammatical errors. We ultimately constructed about 1k parallel data and utilized these data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese University of Hong Kong, Shenzhen) with instruction tuning. The experimental results show that GrammarGPT outperforms the existing SOTA system significantly. Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT ranks $3^{rd}$ on NLPCC2023 SharedTask1, demonstrating our approach's effectiveness. The code and data are available at \url{<a class="link-external link-https" href="https://github.com/FreedomIntelligence/GrammarGPT" rel="external noopener nofollow">this https URL</a>}.

HWCGEC:HW-TSC's 2023 Submission for the NLPCC2023's Chinese Grammatical Error Correction Task.

From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction

A Chinese Grammatical Error Correction Model Based On Grammatical Generalization And Parameter Sharing

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction

GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Bi-LSTM Neural Networks for Chinese Grammatical Error Diagnosis.

Combining GCN and Transformer for Chinese Grammatical Error Detection

TransGEC: Improving Grammatical Error Correction with Translationese

LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction

Multi-head Sequence Tagging Model for Grammatical Error Correction

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction

Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction

TemplateGEC: Improving Grammatical Error Correction with Detection Template.

Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction

Automatic Grammatical Error Correction Based on Edit Operations Information.

FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model