Evaluating Performance of LLaMA2 Large Language Model Enhanced by QLoRA Fine-Tuning for English Grammatical Error Correction.

Jing An,Yanbing Bai,Jiyi Li,Junjie Hu,Rui Li,Yuxi Xia,Rui Hua
DOI: https://doi.org/10.1007/978-3-031-68309-1_16
2024-01-01
Abstract:Large Language Models (LLMs) have experienced significant advancements across various contexts. However, their impact on vertical fields remains understudied and unsatisfactory due to the heightened requirement for domain-specific expertise in these fields. English Grammar Error Correction (GEC) is urgently needed in the current academic and educational fields, which are currently full of challenges regarding precision, adaptability, and complex grammatical mistakes. The release of the C4_200M Synthetic Dataset and advancements in LLaMA2's QLoRA fine-tuning technology present an unprecedented opportunity to examine these issues more closely. This study aims to assess the performance of the LLaMA2 in the area of GEC. In this study, we implemented LLaMA2 augmented with QLoRA finetune model in Spark scalable cluster processing environment, and we investigated model performance under two methods, Zero-shot and Few-shot prompting, and configured the parameters for text generation, including Top-p, Top-k, and Beam search. We built an efficient and accurate scalable system, with BLEU from 12.33 to 14.8, ROUGE from 19.33% to 25.97% and the editing distance from 4.21 to 1.89, providing a solid foundation for future work. The code of this paper is available at LINK.
What problem does this paper attempt to address?