Automatic Detection for Machine-generated Texts is Easy.

Mingyang Lyu,Chenlong Bao,Jintao Tang,Ting Wang,Peilei Liu
DOI: https://doi.org/10.1109/smartworld-uic-atc-scalcom-digitaltwin-pricomp-metaverse56740.2022.00223
2022-01-01
Abstract:The recognition of machine-generated text can infer the problems existing in the text generation model, so we evaluate the ability of machine and human to detect and judge the generated text. Specifically, we generate and build five datasets for evaluation adopting models like GPT2-Chinese, Jiuge system, and GPT-Neo in combination with the raw corpus. Compared with the manual classification of volunteers, the text is analyzed by two classical classification models BiLSTM and FastText, which found that the average accuracy of machine classification can reach 94.54%, while the average accuracy of human is only 65.87%. Therefore, we make a thorough research in why machines can effectively distinguish differences between human-written text and machine-generated text. According to the feedback of human evaluators and evaluation of the generated text, this paper puts forward six disadvantages of machine-generated text, including topic drift, prolix sentences, abnormal paragraphs, poor text length control, overused phrases, and sparsity of uncommon characters. Based on different datasets, the defects are tested by analysis models and statistical methods, thus providing relevant guidance for greater improvement in the quality of model generation.
What problem does this paper attempt to address?