Abstract:Text detoxification is a conditional text generation task aiming to remove offensive content from toxic text. It is highly useful for online forums and social media, where offensive content is frequently encountered. Intuitively, there are diverse ways to detoxify sentences while preserving their meanings, and we can select from detoxified sentences before displaying text to users. Conditional diffusion models are particularly suitable for this task given their demonstrated higher generative diversity than existing conditional text generation models based on language models. Nonetheless, text fluency declines when they are trained with insufficient data, which is the case for this task. In this work, we propose DiffuDetox, a mixed conditional and unconditional diffusion model for text detoxification. The conditional model takes toxic text as the condition and reduces its toxicity, yielding a diverse set of detoxified sentences. The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus ensures text fluency. Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed DiffuDetox.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the frequent occurrence of harmful and offensive language (toxic text) in online forums and social media. Specifically, the authors focus on how to remove offensive content from sentences through the conditional text generation task while keeping the semantics unchanged. This task is of great significance for improving the network environment and reducing mental health problems. ### Core Problems of the Paper 1. **Removing Offensive Content**: How to effectively convert text containing offensive and insulting words into harmless and fluent text. 2. **Maintaining Semantic Consistency**: During the detoxification process, ensure that the generated text is as consistent as possible with the meaning of the original text. 3. **Increasing Generation Diversity**: Explore multiple possible detoxification methods to provide more abundant choices and thus enhance the user experience. 4. **Data Scarcity Challenge**: Since the available detoxification data is relatively scarce, how to train an effective model in such a low - resource environment. ### Solutions To address the above challenges, the authors propose a hybrid conditional and non - conditional diffusion model (DiffuDetox), with the following main features: - **Conditional Diffusion Model**: Taking toxic text as a condition, generate diverse detoxified sentences through a series of diffusion steps. - **Non - conditional Diffusion Model**: Used to restore the input text, introduce additional fluent text for training, thereby ensuring the fluency of the generated text. - **Combining the Advantages of Both**: By linearly combining the prediction results of the conditional and non - conditional models, both toxicity is reduced and the fluency of the text is improved. ### Model Framework The overall framework of DiffuDetox is shown in Figure 1 and includes two main parts: 1. **Conditional Learning Stage**: The probability of the conditional gate being closed is \(\varphi\), sample \(x_0\) and \(c\) from the detoxification dataset as non - toxic and toxic texts respectively. 2. **Non - conditional Learning Stage**: The probability of the conditional gate being open is \(1-\varphi\), sample \(x_0\) from the fluent text corpus for training. ### Experimental Results The experimental results show that DiffuDetox outperforms existing baseline methods on multiple evaluation metrics and achieves human - level detoxification performance. Specific evaluation metrics include BLEU, style accuracy (STA), content retention rate (SIM), fluency (FL) and comprehensive score (J score). In particular, the performance of DiffuDetox on the J score exceeds the human level, showing its potential in practical applications. ### Conclusion DiffuDetox successfully solves the key problems in the text detoxification task by combining conditional and non - conditional diffusion models, demonstrating great potential in real - world applications. Future work will further optimize the inference speed of the model and explore more improvement directions.

DiffuDetox: A Mixed Diffusion Model for Text Detoxification

Parameter-Efficient Detoxification with Contrastive Decoding

DetoxLLM: A Framework for Detoxification with Explanations

Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

GPT-DETOX: An In-Context Learning-Based Paraphraser for Text Detoxification

Text Detoxification as Style Transfer in English and Hindi

Fine-grained detoxification framework via instance-level prefixes for large language models

MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages

Text Detoxification using Large Pre-trained Neural Models

CMD: a framework for Context-aware Model self-Detoxification

Multilingual Text Detoxification Using Google Cloud Translation and Post-Processing

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Text Diffusion with Reinforced Conditioning

Mitigating Text Toxicity with Counterfactual Generation

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts

Reward Modeling for Mitigating Toxicity in Transformer-based Language Models

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

DiffUTE: Universal Text Editing Diffusion Model