HWCGEC:HW-TSC's 2023 Submission for the NLPCC2023's Chinese Grammatical Error Correction Task.

Chang Su ,Xiaofeng Zhao,Xiaosong Qiao,Min Zhang,Hao Yang ,Junhao Zhu,Ming Zhu,Wenbing Ma
DOI: https://doi.org/10.1007/978-3-031-44699-3_6
2023-01-01
Abstract:Deep learning has shown remarkable effectiveness in various language tasks. This paper presents Huawei Translation Services Center’s (HW-TSC’s) work called HWCGEC which get the best performance among the seven submitted results in the NLPCC2023 shared task 1, namely Chinese grammatical error correction (CGEC). CGEC aims to automatically correct grammatical errors that violate language rules and converts the noisy input texts to clean output texts. This paper, through experiments, discovered that after model fine-tuning the BART a sequence to sequence (seq2seq) model performs better than the ChatGLM a large language model (LLM) in situations where training data is large while the LoRA mode has a smaller number of parameters for fine-tuning. Additionally, the BART model achieves good results in the CGEC task through data augmentation and curriculum learning methods. Although the performance of LLM is poor in experiments, they possess excellent logical abilities. With the training set becoming more diverse and the methods for training set data augmentation becoming more refined, the supervised fine-tuning (SFT) mode trained LLMs are expected to achieve significant improvements in CGEC tasks in the future.
What problem does this paper attempt to address?