Generative AI for Hate Speech Detection: Evaluation and Findings

Sagi Pendzel,Tomer Wullach,Amir Adler,Einat Minkov
2023-11-17
Abstract:Automatic hate speech detection using deep neural models is hampered by the scarcity of labeled datasets, leading to poor generalization. To mitigate this problem, generative AI has been utilized to generate large amounts of synthetic hate speech sequences from available labeled examples, leveraging the generated data in finetuning large pre-trained language models (LLMs). In this chapter, we provide a review of relevant methods, experimental setups and evaluation of this approach. In addition to general LLMs, such as BERT, RoBERTa and ALBERT, we apply and evaluate the impact of train set augmentation with generated data using LLMs that have been already adapted for hate detection, including RoBERTa-Toxicity, HateBERT, HateXplain, ToxDect, and ToxiGen. An empirical study corroborates our previous findings, showing that this approach improves hate speech generalization, boosting recall performance across data distributions. In addition, we explore and compare the performance of the finetuned LLMs with zero-shot hate detection using a GPT-3.5 model. Our results demonstrate that while better generalization is achieved using the GPT-3.5 model, it achieves mediocre recall and low precision on most datasets. It is an open question whether the sensitivity of models such as GPT-3.5, and onward, can be improved using similar techniques of text generation.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the poor generalization ability in automatic hate speech detection due to the scarcity of labeled datasets. Specifically, the researchers utilize Generative AI to generate a large number of synthetic hate speech sequences to augment existing labeled datasets. These synthetic data are then used to fine-tune large pre-trained language models (LLMs), thereby improving the model's generalization ability across different data distributions, particularly in terms of recall performance. ### Background of the Paper - **Problem Background**: Hate speech detection is a significant social issue, especially on social media. However, existing labeled datasets are very scarce and imbalanced, leading to poor generalization ability in deep neural network-based automatic detection methods. - **Existing Challenges**: The existing datasets are not only limited in quantity but also exhibit thematic and lexical biases, making the models prone to overfitting and ineffective in recognizing new or different sources of hate speech. ### Solution - **Generative AI**: Researchers use Generative AI techniques to generate a large number of synthetic hate speech sequences from existing labeled data. - **Data Augmentation**: The generated synthetic data are combined with existing labeled data to fine-tune large pre-trained language models (such as BERT, RoBERTa, etc.). - **Experimental Evaluation**: Cross-dataset experimental evaluations are conducted to verify whether the generated synthetic data can improve the model's generalization ability, particularly in terms of recall performance. ### Main Findings - **Improved Generalization Ability**: Experimental results show that using generated synthetic data for data augmentation can significantly improve the model's generalization ability, especially in cross-dataset evaluations. - **Recall Improvement**: In most experiments, the model's recall rate was significantly improved, with an average increase of more than 24%. - **Precision Decline**: Although the recall rate improved significantly, precision declined in some cases, with an average decrease of 9.6%. - **Overall Performance**: Overall, the F1 score (the harmonic mean of recall and precision) improved in most models, with an average increase of 5.0%. ### Conclusion - **Effectiveness of Data Augmentation**: Synthetic data generated by Generative AI can effectively improve the generalization ability of hate speech detection models, particularly in terms of recall. - **Future Directions**: Despite significant improvements, hate speech detection remains a challenging task that requires further research and exploration, especially in reducing noise and bias in generated data and further improving model precision. This paper demonstrates through experiments the potential of Generative AI in addressing the data scarcity problem in hate speech detection, providing an important reference for future related research.