Factored Statistical Machine Translation for Grammatical Error Correction

Yiming Wang,Longyue Wang,Xiaodong Zeng,Derek F. Wong,Lidia S. Chao,Yi Lu
DOI: https://doi.org/10.3115/v1/w14-1711
2014-01-01
Abstract:This paper describes our ongoing work on grammatical error correction (GEC). Focusing on all possible error types in a real-life environment, we propose a factored statistical machine translation (SMT) model for this task. We consider error correction as a series of language translation problems guided by various linguistic information, as factors that influence translation results. Factors included in our study are morphological information, i.e. word stem, prefix, suffix, and Part-of-Speech (PoS) information. In addition, we also experimented with different combinations of translation models (TM), phrase-based and factor-based, trained on various datasets to boost the overall performance. Empirical results show that the proposed model yields an improvement of 32.54% over a baseline phrase-based SMT model. The system participated in the CoNLL 2014 shared task and achieved the 7 th and 5 th F0.5 scores 1 on the official test set among the thirteen participating teams.
What problem does this paper attempt to address?