Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Huachuan Qiu,Shuai Zhang,Hongliang He,Anqi Li,Zhenzhong Lan

2024-03-20

Abstract:Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of large language models to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source large language models, with the majority vote determining the label. Second, we use ChatGPT to update the empty label from the first step. Third, to ensure the quality of the validation and test sets, we utilize GPT-4 for label calibration. If the current label does not match the one generated by GPT-4, we employ a self-criticism strategy to verify its correctness. Finally, to facilitate the detection of pornographic text, we develop a series of text classifiers using a pseudo-labeled dataset. Detailed data analysis demonstrates that leveraging knowledge distillation techniques with large language models provides a practical and cost-efficient method for developing pornographic text detectors.

Computation and Language

What problem does this paper attempt to address?

This paper aims to address the issue of detecting pornographic text in open-domain dialogue systems. Specifically, researchers have found that pornographic content appearing in human-computer interactive dialogues can have serious negative impacts on users, especially children and adolescents. However, there is relatively little research on detecting such content. To advance research in this direction, the authors propose a dialogue monitoring dataset named CENSOR CHAT, which aims to detect whether dialogues contain pornographic content. To construct this dataset, the authors first collected multi-turn dialogue data from real human-computer interactions and split them into individual utterances and single-turn dialogues. Then, they used knowledge distillation techniques from large-scale language models to annotate the dataset, reducing time and labor costs. The specific methods include using multiple open-source large language models for majority voting annotation, updating pseudo-labels with ChatGPT, and finally calibrating labels with GPT-4. Additionally, the researchers developed a series of text classifiers trained on the pseudo-labeled dataset and evaluated the performance of these classifiers on a test set. Through this approach, the paper demonstrates that the technique of knowledge distillation using large-scale language models is practical and cost-effective in developing pornographic text detectors.

Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Creating a Children-Friendly Reading Environment via Joint Learning of Content and Human Attention

Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media

A Benchmark for Understanding Dialogue Safety in Mental Health Support

Pchatbot: A Large-Scale Dataset for Personalized Chatbot

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

Pornography Detection with the Wisdom of Crowds.

Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation

Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization

On the Generalization of Training-based ChatGPT Detection Methods

A Lightweight Graph-based Method to Detect Pornographic and Gambling Websites with Imperfect Datasets

Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis

DeMod: A Holistic Tool with Explainable Detection and Personalized Modification for Toxicity Censorship

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

Deep Learning Detection Method for Large Language Models-Generated Scientific Content

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming

A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

Towards Patronizing and Condescending Language in Chinese Videos: A Multimodal Dataset and Detector