DiffuDetox: A Mixed Diffusion Model for Text Detoxification

Griffin Floto,Mohammad Mahdi Abdollah Pour,Parsa Farinneya,Zhenwei Tang,Ali Pesaranghader,Manasa Bharadwaj,Scott Sanner
DOI: https://doi.org/10.48550/arXiv.2306.08505
2023-06-14
Abstract:Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text. It is highly useful for online forums and social media, where offensive content is frequently encountered. Intuitively, there are diverse ways to detoxify sentences while preserving their meanings, and we can select from detoxified sentences before displaying text to users. Conditional diffusion models are particularly suitable for this task given their demonstrated higher generative diversity than existing conditional text generation models based on language models. Nonetheless, text fluency declines when they are trained with insufficient data, which is the case for this task. In this work, we propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification. The conditional model takes toxic text as the condition and reduces its toxicity, yielding a diverse set of detoxified sentences. The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus ensures text fluency. Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed DiffuDetox.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the frequent occurrence of harmful and offensive language (toxic text) in online forums and social media. Specifically, the authors focus on how to remove offensive content from sentences through the conditional text generation task while keeping the semantics unchanged. This task is of great significance for improving the network environment and reducing mental health problems. ### Core Problems of the Paper 1. **Removing Offensive Content**: How to effectively convert text containing offensive and insulting words into harmless and fluent text. 2. **Maintaining Semantic Consistency**: During the detoxification process, ensure that the generated text is as consistent as possible with the meaning of the original text. 3. **Increasing Generation Diversity**: Explore multiple possible detoxification methods to provide more abundant choices and thus enhance the user experience. 4. **Data Scarcity Challenge**: Since the available detoxification data is relatively scarce, how to train an effective model in such a low - resource environment. ### Solutions To address the above challenges, the authors propose a hybrid conditional and non - conditional diffusion model (DiffuDetox), with the following main features: - **Conditional Diffusion Model**: Taking toxic text as a condition, generate diverse detoxified sentences through a series of diffusion steps. - **Non - conditional Diffusion Model**: Used to restore the input text, introduce additional fluent text for training, thereby ensuring the fluency of the generated text. - **Combining the Advantages of Both**: By linearly combining the prediction results of the conditional and non - conditional models, both toxicity is reduced and the fluency of the text is improved. ### Model Framework The overall framework of DiffuDetox is shown in Figure 1 and includes two main parts: 1. **Conditional Learning Stage**: The probability of the conditional gate being closed is \(\varphi\), sample \(x_0\) and \(c\) from the detoxification dataset as non - toxic and toxic texts respectively. 2. **Non - conditional Learning Stage**: The probability of the conditional gate being open is \(1-\varphi\), sample \(x_0\) from the fluent text corpus for training. ### Experimental Results The experimental results show that DiffuDetox outperforms existing baseline methods on multiple evaluation metrics and achieves human - level detoxification performance. Specific evaluation metrics include BLEU, style accuracy (STA), content retention rate (SIM), fluency (FL) and comprehensive score (J score). In particular, the performance of DiffuDetox on the J score exceeds the human level, showing its potential in practical applications. ### Conclusion DiffuDetox successfully solves the key problems in the text detoxification task by combining conditional and non - conditional diffusion models, demonstrating great potential in real - world applications. Future work will further optimize the inference speed of the model and explore more improvement directions.