Chinese Spam Detection Based on Prompt Tuning

Yan Zhang,Chunyan An
DOI: https://doi.org/10.18293/seke2022-120
2022-01-01
Abstract:Spam has plagued Internet users for a long time, and it is of great significance to design an efficient spam detection method.In recent years, spam detection methods based on fine-tuning pre-trained language models (PLM) have achieved great success.The approach is to fine-tune a pre-trained language model on a large dataset to adapt it to the downstream spam detection task.However, the objective of the initial training phase of PLM is inconsistent with the objective of downstream tasks, which results in the downstream tasks cannot fully utilize the latent knowledge in PLM.In this paper, we use Prompt Tuning and PLM to identify Chinese spam by constructing additional prompt templates, converting the email classification task into a fill-in-the-blank task, and then getting the email classification results according to the filling content on the prompt templates.This process is very similar to the process of initial training of PLM, which can more fully utilize the rich knowledge in PLM.We use prompt tuning to train the model on public datasets.Through experiments, we found that the accuracy score of the proposed model on trec06 datasets can reach 0.996, and the F1 score can reach 0.994, which is better than the comparison model.In terms of model convergence speed, the proposed model only needs less than 200 training steps to converge, which is faster than the comparison model.
What problem does this paper attempt to address?