Examining Temporal Bias in Abusive Language Detection

Mali Jin,Yida Mu,Diana Maynard,Kalina Bontcheva
2023-09-25
Abstract:The use of abusive language online has become an increasingly pervasive problem that damages both individuals and society, with effects ranging from psychological harm right through to escalation to real-life violence and even death. Machine learning models have been developed to automatically detect abusive language, but these models can suffer from temporal bias, the phenomenon in which topics, language use or social norms change over time. This study aims to investigate the nature and impact of temporal bias in abusive language detection across various languages and explore mitigation methods. We evaluate the performance of models on abusive data sets from different time periods. Our results demonstrate that temporal bias is a significant challenge for abusive language detection, with models trained on historical data showing a significant drop in performance over time. We also present an extensive linguistic analysis of these abusive data sets from a diachronic perspective, aiming to explore the reasons for language evolution and performance decline. This study sheds light on the pervasive issue of temporal bias in abusive language detection across languages, offering crucial insights into language evolution and temporal bias mitigation.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the temporal bias in online abusive language detection. Specifically, with the continuous changes in social norms, topics, and language use, the performance of machine - learning models may decline when detecting abusive language. This temporal bias phenomenon causes the model to perform less well on new data than on historical data. The paper aims to explore the nature and impact of temporal bias by analyzing datasets from different time periods and to explore methods to mitigate this problem. ### Research Background and Problem Description Online abusive language (such as abuse, hate speech, etc.) has become an increasingly serious problem, which not only causes psychological harm to individuals but may also lead to violent events and even deaths in reality. In order to automatically detect these abusive languages, researchers have developed a variety of machine - learning models. However, these models may be affected by temporal bias, that is, as time passes, changes in language and social norms make the models perform less well on new data than on old data. ### Main Research Questions The paper mainly explores the following core research questions: 1. **RQ1**: How does the degree of temporal bias change in different datasets (such as language, time span, and collection methods)? 2. **RQ2**: How does the evolution of language lead to temporal bias in datasets? 3. **RQ3**: Can domain - adaptation models, large - language models (LLMs), or more robust datasets help mitigate temporal bias in abusive - language detection? ### Methods and Experimental Design To answer the above questions, the author adopted the following methods: - **Dataset Selection**: Five abusive - language datasets covering 4 languages (English, Spanish, Italian, and Chinese) in different time periods were selected. - **Data Splitting**: Two strategies, random splitting and time - order splitting, were used to split the datasets into training sets and test sets. - **Model Selection**: The prediction performance of models such as Logistic Regression (LR), BERT, RoBERTa and its variants for hate speech (RoBERTa - hate - speech), and OpenAssistant (OA) were compared. - **Performance Evaluation**: The performance of the models was evaluated by Accuracy, Precision, Recall, and macro - F1 score. ### Experimental Results - **The Influence of Time - order Splitting**: Compared with random splitting, time - order splitting generally led to a decline in model performance, especially in datasets with a long time span. For example, on the WASEEM dataset, the time - order splitting using the RoBERTa model led to a 16.93% decrease in the F1 score. - **Comparison of Different Models**: Domain - adaptation models (such as RoBERTa - hate - speech) performed better in mitigating temporal bias, especially in datasets with a long time span. - **Zero - shot Classification**: The OpenAssistant model performed excellently in zero - shot classification tasks, and its performance was less affected by temporal bias. ### Conclusions Through detailed experiments and analyses, the paper reveals the widespread existence of temporal bias in abusive - language detection and proposes several possible mitigation methods. These findings are of great significance for improving the robustness and accuracy of online abusive - language detection systems.