Grammatical Error Correction: More Data with More Context

Kevin Parnow,Zuchao Li,Hai Zhao
DOI: https://doi.org/10.1109/ialp51396.2020.9310498
2020-01-01
Abstract:Grammatical Error Correction (GEC) seriously suffers from a scarcity of data, both annotated and unannotated, as humans do not intentionally make grammatical errors. To account for this, we make use of the plentiful unlabeled plain text available and augment a dataset with artificial noise to increase our effective training data and pre-train our model as a denoising autoencoder (DAE), which offers an intuitive data augmentation solution for GEC. In a novel approach, we enhance our DAE, a Transformer Model, with a cross-document context mechanism and use a parallel encoder to encode the cross-document context before fusing the two contexts of the encoders in the decoder. Supplied by the combination of document similarity metrics and any unlabeled plain text, this serves as a new method of equipping a GEC model with supplemental context and allowing it to glean grammatical information from a separate plain text corpus. We evaluate our model on the CoNLL-2014 GEC Shared Task and achieve results that approach state-of-the-art for single models and show great potential with ever available and plentiful plain text.
What problem does this paper attempt to address?