An Alignment-Agnostic Model for Chinese Text Error Correction

Liying Zheng,Yue Deng,Weishun Song,Liang Xu,Jing Xiao

DOI: https://doi.org/10.48550/arXiv.2104.07190

2021-09-18

Abstract:This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which is common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters errors, but they cannot deal with missing or redundant characters. The reason is that lengths of sentences before and after correction are not the same, leading to the inconsistence between model inputs and outputs. Although the Seq2Seq-based or sequence tagging methods provide solutions to the problem and achieved relatively good results on English context, but they do not perform well in Chinese context according to our experimental results. In our work, we propose a novel detect-correct framework which is alignment-agnostic, meaning that it can handle both text aligned and non-aligned occasions, and it can also serve as a cold start model when there are no annotated data provided. Experimental results on three datasets demonstrate that our method is effective and achieves the best performance among existing published models.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper aims to solve the problem of error correction in Chinese texts, especially for three common error types among Chinese - native speakers during the writing process: mistaken characters, missing characters, and redundant characters. The existing models based on the detection - correction framework can handle mistaken characters relatively well, but they are not effective in dealing with errors such as missing characters and redundant characters that cause text misalignment, because there is an inconsistency between the input and output of these models. In addition, although the sequence - to - sequence (Seq2Seq) or sequence - tagging methods perform relatively well in dealing with these three error types in the English context, the experimental results in the Chinese context are not satisfactory. For this reason, the author proposes a new alignment - agnostic detect - correct framework. This framework can not only handle both text - aligned and non - aligned situations, but also can serve as a cold - start model to provide services without labeled data. The experimental results on three datasets show that this method performs better than most recently published models.

An Alignment-Agnostic Model for Chinese Text Error Correction

Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector

Research on Chinese Text Error Correction Based on Sequence Model

An Error-Guided Correction Model for Chinese Spelling Error Correction

Automatic Chinese text error correction approach based-on fast approximate Chinese word-matching algorithm

Winnow-based approach in automatic error detection and correction of Chinese text

An Adversarial Multi-Task Learning Method for Chinese Text Correction with Semantic Detection

MIATS:A Chinese Spelling Error Correction Algorithm Based on Multimodal Information Alignment of Three-Towers Structure

From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Adjusting the Precision-Recall Trade-Off with Align-and-Predict Decoding for Grammatical Error Correction

Bi-LSTM Neural Networks for Chinese Grammatical Error Diagnosis.

Multi-head Sequence Tagging Model for Grammatical Error Correction

A Chinese Grammatical Error Correction Model Based On Grammatical Generalization And Parameter Sharing

Count, Decompose and Correct: A New Approach to Handwritten Chinese Character Error Correction

Focus Is What You Need For Chinese Grammatical Error Correction

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

CRASpell: A Contextual Typo Robust Approach to Improve Chinese Spelling Correction

Count, Decode and Fetch: A New Approach to Handwritten Chinese Character Error Correction

LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction

A Chinese OCR Spelling Check Approach Based on Statistical Language Models.