Abstract:To solve the Grammatical Error Correction (GEC) problem , a mapping between a source sequence and a target one is needed, where the two differ only on few spans. For this reason, the attention has been shifted to the non-autoregressive or sequence tagging models. In which, the GEC has been simplified from Seq2Seq to labeling the input tokens with edit commands chosen from a large edit space. Due to this large number of classes and the limitation of the available datasets, the current sequence tagging approaches still have some issues handling a broad range of grammatical errors just by being laser-focused on one single task. To this end, we simplified the GEC further by dividing it into seven related subtasks: Insertion, Deletion, Merge, Substitution, Transformation, Detection, and Correction, with Correction being our primary focus. A distinct classification head is dedicated to each of these subtasks. the novel multi-head and multi-task learning model is proposed to effectively utilize training data and harness the information from related task training signals. To mitigate the limited number of available training samples, a new denoising autoencoder is used to generate a new synthetic dataset to be used for pretraining. Additionally, a new character-level transformation is proposed to enhance the sequence-to-edit function and improve the model's vocabulary coverage. Our single/ensemble model achieves an F0.5 of 74.4/77.0, and 68.6/69.1 on BEA-19 (test) and CoNLL-14 (test) respectively. Moreover, evaluated on JFLEG test set, the GLEU scores are 61.6 and 61.7 for the single and ensemble models, respectively. It mostly outperforms recently published state-of-the-art results by a considerable margin.

Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Grammatical Error Correction: A Survey of the State of the Art

Evaluation of really good grammatical error correction

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation

Grammatical Error Correction for Code-Switched Sentences by Learners of English

TransGEC: Improving Grammatical Error Correction with Translationese

A Simple Recipe for Multilingual Grammatical Error Correction

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction

Revisiting Meta-evaluation for Grammatical Error Correction

TemplateGEC: Improving Grammatical Error Correction with Detection Template.

A Simple Yet Effective Corpus Construction Framework for Indonesian Grammatical Error Correction

Comparison of Grammatical Error Correction Using Back-Translation Models

Multi-head Sequence Tagging Model for Grammatical Error Correction

Grammatical Error Correction in Low-Resource Scenarios

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models

ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction

A Simple but Effective Classification Model for Grammatical Error Correction.