Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Huachuan Qiu,Shuai Zhang,Hongliang He,Anqi Li,Zhenzhong Lan
2024-03-20
Abstract:Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of large language models to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source large language models, with the majority vote determining the label. Second, we use ChatGPT to update the empty label from the first step. Third, to ensure the quality of the validation and test sets, we utilize GPT-4 for label calibration. If the current label does not match the one generated by GPT-4, we employ a self-criticism strategy to verify its correctness. Finally, to facilitate the detection of pornographic text, we develop a series of text classifiers using a pseudo-labeled dataset. Detailed data analysis demonstrates that leveraging knowledge distillation techniques with large language models provides a practical and cost-efficient method for developing pornographic text detectors.
Computation and Language
What problem does this paper attempt to address?
This paper aims to address the issue of detecting pornographic text in open-domain dialogue systems. Specifically, researchers have found that pornographic content appearing in human-computer interactive dialogues can have serious negative impacts on users, especially children and adolescents. However, there is relatively little research on detecting such content. To advance research in this direction, the authors propose a dialogue monitoring dataset named CENSOR CHAT, which aims to detect whether dialogues contain pornographic content. To construct this dataset, the authors first collected multi-turn dialogue data from real human-computer interactions and split them into individual utterances and single-turn dialogues. Then, they used knowledge distillation techniques from large-scale language models to annotate the dataset, reducing time and labor costs. The specific methods include using multiple open-source large language models for majority voting annotation, updating pseudo-labels with ChatGPT, and finally calibrating labels with GPT-4. Additionally, the researchers developed a series of text classifiers trained on the pseudo-labeled dataset and evaluated the performance of these classifiers on a test set. Through this approach, the paper demonstrates that the technique of knowledge distillation using large-scale language models is practical and cost-effective in developing pornographic text detectors.