Abstract:Grammatical Error Correction (GEC) refers to the automatic identification and amendment of grammatical, spelling, punctuation, and word-positioning errors in monolingual texts. Neural Machine Translation (NMT) is nowadays one of the most valuable techniques used for GEC but it may suffer from scarcity of training data and domain shift, depending on the addressed language. However, current techniques (e.g., tuning pre-trained language models or developing spell-confusion methods without focusing on language diversity) tackling the data sparsity problem associated with NMT create mismatched data distributions. This paper proposes new aggressive transformation approaches to augment data during training that extend the distribution of authentic data. In particular, it uses augmented data as auxiliary tasks to provide new contexts when the target prefix is not helpful for the next word prediction. This enhances the encoder and steadily increases its contribution by forcing the GEC model to pay more attention to the text representations of the encoder during decoding. The impact of these approaches was investigated using the Transformer-based for low-resource GEC task, and Arabic GEC was used as a case study. GEC models trained with our data tend more to source information, are more domain shift robustness, and have less hallucinations with tiny training datasets and domain shift. Experimental results showed that the proposed approaches outperformed the baseline, the most common data augmentation methods, and classical synthetic data approaches. In addition, a combination of the three best approaches Misspelling , Swap , and Reverse achieved the best F 1 score in two benchmarks and outperformed previous Arabic GEC approaches.

Automatic Arabic Grammatical Error Correction based on Expectation-Maximization routing and target-bidirectional agreement

Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

Semi-supervised learning and bidirectional decoding for effective grammar correction in low-resource scenarios

Synthetic Data with Neural Machine Translation for Automatic Correction in Arabic Grammar

Proposed Model for Arabic Grammar Error Correction Based on Convolutional Neural Network

Optimizing the Impact of Data Augmentation for Low-Resource Grammatical Error Correction.

Automatic Grammatical Error Correction Based on Edit Operations Information.

ChatGPT for Arabic Grammatical Error Correction

Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction

Toward Perfect Neural Cascading Architecture for Grammatical Error Correction

Adversarial Grammatical Error Correction

Efficient and Interpretable Grammatical Error Correction with Mixture of Experts

Multi-head Sequence Tagging Model for Grammatical Error Correction

TransGEC: Improving Grammatical Error Correction with Translationese

Grammatical Error Correction: A Survey of the State of the Art

Grammatical Error Correction via Mixed-Grained Weighted Training

Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples

Automatic Correction of Indonesian Grammatical Errors Based on Transformer

Grammatical Error Correction: More Data with More Context

Grammatical Error Correction with Dependency Distance

Grammatical Error Correction for Low-Resource Languages: The Case of Zarma