Abstract:Chinese spelling correction (CSC) constitutes a pivotal and enduring goal in natural language processing, serving as a foundational element for various language-related tasks by detecting and rectifying spelling errors in textual content. Numerous methods for Chinese spelling correction leverage multimodal information, including character, character sound, and character shape, to establish connections between incorrect and correct characters. Research indicates that a majority of spelling errors stem from pinyin similarity, with character similarity accounting for half of the errors. Consequently, effectively modeling character pinyin and character relationships emerges as a key challenge in the CSC task. In this study, we propose enhancing the CSC task by introducing the pinyin character prediction task. We employ an adaptive weighting method in the pinyin character prediction task to address predictions in a more granular manner, achieving a balance between the two prediction tasks. The proposed model, SPMSpell, utilizes ChineseBERT as an encoder to capture multimodal feature information simultaneously. It incorporates three parallel decoders for character prediction, pinyin prediction, and self-distillation modules. To mitigate potential overfitting concerning pinyin, a self-distillation method is introduced to prioritize character information in predictions. Extensive experiments conducted on three SIGHAN benchmark tests showcase that the model introduced in this paper attains a superior level of performance. This substantiates the correctness and superiority of the adaptive weighted pinyin character prediction task and underscores the effectiveness of the self-distillation module.

A Hybrid Approach Towards Chinese Spelling and Splitting Error Correction

A Multimodal Method for Chinese Spelling Correction.

Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction

Improve Chinese Spelling Check by Reevaluation

Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios

Towards Robust Chinese Spelling Check Systems: Multi-round Error Correction with Ensemble Enhancement.

Spelling Error Correction with Soft-Masked BERT

Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking

Bridging the Gap: A Self-Learning Model Using Implicit Knowledge for Chinese Spelling Correction.

MCSSpell:Optimal Path Selection of Candidate Characters by Integrating Multimodal Information and Copy Mechanism for Chinese Spelling Correction.

MISpeller: Multimodal Information Enhancement for Chinese Spelling Correction

Correcting Chinese Spelling Errors with Phonetic Pre-training

Improving Chinese Spelling Correction by Ranking.

Chinese Spelling Error Correction by Multi-Task Learning with Pronunciation Gap Predictor

Self-Distillation and Pinyin Character Prediction for Chinese Spelling Correction Based on Multimodality

Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models

MLSL-Spell: Chinese Spelling Check Based on Multi-Label Annotation

Disentangled Phonetic Representation for Chinese Spelling Correction

Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check

CSEC: A Chinese Semantic Error Correction Dataset for Written Correction.

uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers