Text Detoxification as Style Transfer in English and Hindi

Sourabrata Mukherjee,Akanksha Bansal,Atul Kr. Ojha,John P. McCrae,Ondřej Dušek
2024-06-10
Abstract:This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, combining sequence-to-sequence modeling with various toxicity classification tasks, and delete and reconstruct approach. To support our research, we utilize a dataset provided by Dementieva et al.(2021), which contains multiple versions of detoxified texts corresponding to toxic texts. In our experiments, we selected the best variants through expert human annotators, creating a dataset where each toxic sentence is paired with a single, appropriate detoxified version. Additionally, we introduced a small Hindi parallel dataset, aligning with a part of the English dataset, suitable for evaluation purposes. Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of text detoxification, which involves automatically converting text containing offensive or harmful content into non-offensive, non-harmful text. This task can be seen as part of Text Style Transfer (TST), where the source style is toxic language and the target style is non-toxic language. The goal of the paper is to retain the core content and fluency of the original text during the conversion process, transforming the text from harmful or offensive nature to neutral or positive nature. The authors propose three methods to improve the existing simple sequence-to-sequence training methods: 1. **Knowledge Transfer**: Transfer knowledge from similar tasks. 2. **Multi-task Learning**: Combine sequence-to-sequence modeling with various toxicity classification tasks. 3. **Delete and Reconstruct**: Reconstruct sentences after deleting toxic vocabulary. Additionally, the study utilizes the dataset provided by Dementieva et al. and creates a Hindi dataset containing 500 parallel sentences for validation purposes. Through these methods, the authors hope to improve text detoxification in low-resource settings and promote a safer and more respectful online communication environment.