Method for Chinese Grammar Error Detection Integrating ELECTRA and Text Local Information
CHEN Bailin,WANG Tianji,REN Lina,HUANG Ruizhang
DOI: https://doi.org/10.19678/j.issn.1000-3428.0064014
2023-01-01
Abstract:Grammar error detection is a basic task in natural language processing. The task aims to automatically identify typos,grammar,and word order errors in text.Compared with other languages,Chinese grammar is flexible and lacks symbolic information such as tense and voice.Therefore,the local information of the text plays an important role in Chinese Grammar Error Detection(CGED).Conventional machine learning methods are difficult to detect grammatical errors in a text,whereas the existing deep learning methods cannot utilize the local information of the text during error correction fully and effectively,resulting in poor grammatical error detection effect. To solve this problem,this study proposes a CGED model,ELECTRA-GCNN-CRF,integrating an ELECTRA and the local information of the text.Grammar error detection is regarded as a sequence annotation task. First,the text is represented by an ELECTRA pretraining language model. Second,a Convolution Neural Network(CNN) is used to extract the local position and semantic information of the text and the residual and gating mechanisms are introduced to reduce the impact of invalid information. Finally,the internal relationship between tags is learned through a CRF model,and the grammar error tag sequence conforming to the labeling rules is output. The model proposed in this study is tested on the Chinese grammatical error evaluation dataset of NLPTEA. The F1 values of detection-,identification-,and position-level increased by 0.94,3.74,and 5.03 percentage points,respectively,compared with the baseline model,which improves the effect of grammatical error detection.