Automatic Chinese Spelling Checking and Correction Based on Character-Based Pre-trained Contextual Representations.

Haihua Xie,Aolin Li,Yabo Li,Jing Cheng,Zhiyou Chen,Xiaoqing Lyu,Zhi Tang
DOI: https://doi.org/10.1007/978-3-030-32236-6_49
2019-01-01
Abstract:Automatic Chinese spelling checking and correction (CSC) is currently a challenging task especially when the sentence is complex in semantics and expressions. Meanwhile, a CSC model normally requires a huge amount of training corpus which is usually unavailable. To capture the semantic information of sentences, this paper proposes an approach (named as DPL-Corr) based on character-based pre-trained contextual representations, which helps to significantly improve the performance of CSC. In DPL-Corr, the module of spelling checking is a sequence-labeling model enhanced by deep contextual semantics analysis, and the module of spelling correction is a masked language model integrated with multilayer filtering to obtain the final corrections. Based on experiments on SIGHAN 2015 dataset, DPL-Corr achieves a significantly better performance of CSC than conventional models.
What problem does this paper attempt to address?