Abstract:The widespread presence of hateful languages on social media has resulted in adverse effects on societal well-being. As a result, it has become very important to address this issue with high priority. Hate speech or offensive languages exist in both explicit and implicit forms, with the latter being more challenging to detect. Current research in this domain encounters several challenges. Firstly, the existing datasets primarily rely on the collection of texts containing explicit offensive keywords, making it challenging to capture implicitly offensive contents that are devoid of these keywords. Secondly, usual methodologies tend to focus solely on textual analysis, neglecting the valuable insights that community information can provide. In this research paper, we introduce a novel dataset OffLanDat, a community based implicit offensive language dataset generated by ChatGPT containing data for 38 different target groups. Despite limitations in generating offensive texts using ChatGPT due to ethical constraints, we present a prompt-based approach that effectively generates implicit offensive languages. To ensure data quality, we evaluate our data with human. Additionally, we employ a prompt-based Zero-Shot method with ChatGPT and compare the detection results between human annotation and ChatGPT annotation. We utilize existing state-of-the-art models to see how effective they are in detecting such languages. We will make our code and dataset public for other researchers.

A Dataset for the Detection of Dehumanizing Language

A Framework for the Computational Linguistic Analysis of Dehumanization

"It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online

Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language

LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification

D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation

MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection

Abusive Language Detection in Online User Content

AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection

OffLanDat: A Community Based Implicit Offensive Language Dataset Generated by Large Language Model Through Prompt Engineering

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention

How Hateful are Movies? A Study and Prediction on Movie Subtitles

A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages

A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media

Deep Learning Models for Multilingual Hate Speech Detection

Korean Online Hate Speech Dataset for Multilabel Classification: How Can Social Science Improve Dataset on Hate Speech?

Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language

Automatic Detection of Sexist Statements Commonly Used at the Workplace

DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances

Deep Learning for Hate Speech Detection: A Comparative Study

LLM-Based Synthetic Datasets: Applications and Limitations in Toxicity Detection