ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information

Zheng Hui,Zhaoxiao Guo,Hang Zhao,Juanyong Duan,Congrui Huang
2024-09-23
Abstract:In different NLP tasks, detecting harmful content is crucial for online environments, especially with the growing influence of social media. However, previous research has two main issues: 1) a lack of data in low-resource settings, and 2) inconsistent definitions and criteria for judging harmful content, requiring classification models to be robust to spurious features and diverse. We propose Toxicraft, a novel framework for synthesizing datasets of harmful information to address these weaknesses. With only a small amount of seed data, our framework can generate a wide variety of synthetic, yet remarkably realistic, examples of toxic information. Experimentation across various datasets showcases a notable enhancement in detection model robustness and adaptability, surpassing or close to the gold labels. We release the generated data at Github upon acceptance.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two major issues in harmful information detection in online environments: 1. **Lack of Data in Low-Resource Settings**: In many tasks, especially on social media, the amount of harmful information data is very limited, which restricts the training and performance of models. 2. **Inconsistency in Definitions and Standards of Harmful Content**: Different studies have varying definitions and standards for harmful content, leading to classification models being easily influenced by false features and lacking diversity. To solve these problems, the authors propose a new framework called **ToxiCraft** for synthesizing datasets of harmful information. By using only a small amount of seed data, this framework can generate a large number of realistic harmful information examples. Experimental results show that data generated using ToxiCraft can significantly improve the robustness and adaptability of detection models, even approaching or exceeding the performance of gold-standard labels.