Improve Chinese Spelling Check by Reevaluation

Shuai Wang,Lin Shang
DOI: https://doi.org/10.1007/978-3-031-05981-0_19
2022-01-01
Abstract:Chinese Spelling Check (CSC) aims to detect and correct the spelling errors in Chinese. Most Chinese spelling errors are misused semantically, phonetically or graphically similar characters. Previous state-of-the-art works on the CSC task pursue transitions from misspelled sentences to correct sentences directly. However, the spelling errors, especially the continuous incorrect characters, usually confuse the meaning of the semantic context. It is difficult to make correct modifications for CSC models based on the error contextual information. To address this issue, we propose a simple but effective pipeline for CSC by searching the most appropriate candidate sentences as the original correct sentence. Specifically, candidate sentences are generated based on possible error characters with the confusion set. Then we reevaluate the candidate sentences to find the best in terms of character probabilities and similarity compared to the original error characters. Besides, we extend the widely used confusion set (The code and data are available at https://github.com/zuoyecihua/CSC.). Simply applying the confusion set as a filter will bring large performance improvement. The experimental results show that our approach outperforms previous methods and performs well on bi-gram errors.
What problem does this paper attempt to address?