Abstract:Automatic hate speech detection using deep neural models is hampered by the scarcity of labeled datasets, leading to poor generalization. To mitigate this problem, generative AI has been utilized to generate large amounts of synthetic hate speech sequences from available labeled examples, leveraging the generated data in finetuning large pre-trained language models (LLMs). In this chapter, we provide a review of relevant methods, experimental setups and evaluation of this approach. In addition to general LLMs, such as BERT, RoBERTa and ALBERT, we apply and evaluate the impact of train set augmentation with generated data using LLMs that have been already adapted for hate detection, including RoBERTa-Toxicity, HateBERT, HateXplain, ToxDect, and ToxiGen. An empirical study corroborates our previous findings, showing that this approach improves hate speech generalization, boosting recall performance across data distributions. In addition, we explore and compare the performance of the finetuned LLMs with zero-shot hate detection using a GPT-3.5 model. Our results demonstrate that while better generalization is achieved using the GPT-3.5 model, it achieves mediocre recall and low precision on most datasets. It is an open question whether the sensitivity of models such as GPT-3.5, and onward, can be improved using similar techniques of text generation.

What problem does this paper attempt to address?

The problem this paper attempts to address is the poor generalization ability in automatic hate speech detection due to the scarcity of labeled datasets. Specifically, the researchers utilize Generative AI to generate a large number of synthetic hate speech sequences to augment existing labeled datasets. These synthetic data are then used to fine-tune large pre-trained language models (LLMs), thereby improving the model's generalization ability across different data distributions, particularly in terms of recall performance. ### Background of the Paper - **Problem Background**: Hate speech detection is a significant social issue, especially on social media. However, existing labeled datasets are very scarce and imbalanced, leading to poor generalization ability in deep neural network-based automatic detection methods. - **Existing Challenges**: The existing datasets are not only limited in quantity but also exhibit thematic and lexical biases, making the models prone to overfitting and ineffective in recognizing new or different sources of hate speech. ### Solution - **Generative AI**: Researchers use Generative AI techniques to generate a large number of synthetic hate speech sequences from existing labeled data. - **Data Augmentation**: The generated synthetic data are combined with existing labeled data to fine-tune large pre-trained language models (such as BERT, RoBERTa, etc.). - **Experimental Evaluation**: Cross-dataset experimental evaluations are conducted to verify whether the generated synthetic data can improve the model's generalization ability, particularly in terms of recall performance. ### Main Findings - **Improved Generalization Ability**: Experimental results show that using generated synthetic data for data augmentation can significantly improve the model's generalization ability, especially in cross-dataset evaluations. - **Recall Improvement**: In most experiments, the model's recall rate was significantly improved, with an average increase of more than 24%. - **Precision Decline**: Although the recall rate improved significantly, precision declined in some cases, with an average decrease of 9.6%. - **Overall Performance**: Overall, the F1 score (the harmonic mean of recall and precision) improved in most models, with an average increase of 5.0%. ### Conclusion - **Effectiveness of Data Augmentation**: Synthetic data generated by Generative AI can effectively improve the generalization ability of hate speech detection models, particularly in terms of recall. - **Future Directions**: Despite significant improvements, hate speech detection remains a challenging task that requires further research and exploration, especially in reducing noise and bias in generated data and further improving model precision. This paper demonstrates through experiments the potential of Generative AI in addressing the data scarcity problem in hate speech detection, providing an important reference for future related research.

Generative AI for Hate Speech Detection: Evaluation and Findings

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach

Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation

Hate Speech Detection using OpenAI and GPT-3

Detecting Anti-Semitic Hate Speech using Transformer-based Large Language Models

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

LLM-Based Synthetic Datasets: Applications and Limitations in Toxicity Detection

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection

Hate Speech According to the Law: An Analysis for Effective Detection

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models

Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Hate speech detection: A comprehensive review of recent works

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

DeL-haTE: A Deep Learning Tunable Ensemble for Hate Speech Detection

A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs

Semi-Meta-Supervised Hate Speech Detection