Correcting Chinese Spelling Errors with Phonetic Pre-training

Ruiqing Zhang,Chao Pang,Chuanqiang Zhang,Shuohuan Wang,Zhongjun He,Yu Sun,Hua Wu,Haifeng Wang
DOI: https://doi.org/10.18653/v1/2021.findings-acl.198
2021-01-01
Abstract:Chinese spelling correction (CSC) is an important yet challenging task. Existing state-of-the-art methods either only use a pre-trained language model or incorporate phonological information as external knowledge. In this paper, we propose a novel end-to-end CSC model that integrates phonetic features into language model by leveraging the powerful pre-training and fine-tuning method. Instead of conventionally masking words with a special token in training language model, we replace words with phonetic features and their sound-alike words. We further propose an adaptive weighted objective to jointly train error detection and correction in a unified framework. Experimental results show that our model achieves significant improvements on SIGHAN datasets and outperforms the previous state-of-the-art methods.
What problem does this paper attempt to address?