Detection of political hate speech in Korean language

Hyo-sun Ryu,Jae Kook Lee
DOI: https://doi.org/10.1007/s10579-024-09797-x
2024-12-05
Language Resources and Evaluation
Abstract:The proliferation of online hate speech targeting political opponents calls for effective detection and prevention methods. However, existing models have shown limitations in detecting political hate speech in Korean language. This prompts the development of the Korean Political Hate Speech Classifier (KPHC), which aims to automatically identify political hate speech and incitement to hatred in online comments. This study utilizes transfer learning to fine-tune three pre-trained models: KcBERT-large, KcELECTRA-base and KcELECTRA-base-v2022, on a dataset of 20,000 political news comments from South Korea, annotated for the presence of political hate speech and incitement to hatred. The KcELECTRA-base-v2022 model consistently delivers strong performance, even when evaluated with new datasets. On the other hand, the KcELECTRA-base and KcBERT-large models demonstrate inconsistent performance metrics. In particular, their precision and F1 scores are significantly lower in tasks related to identification of incitement to hatred. Overall, the KcELECTRA-base-v2022 model turns out to be the most effective in developing the KPHC for identifying hate speech and incitement to hatred. This study introduces a new dataset for detecting political hate speech in Korean online comments, offering a tool to quantitatively measure its prevalence and identify its predictors. By defining political hate speech based on well-established social scientific theories, this study proposes guidelines for developing datasets that reflect diverse cultural backgrounds. Furthermore, the KPHC can identify expressions that incite discrimination and violence against target groups, thereby enabling proactive responses to harness the harmful expressions.
computer science, interdisciplinary applications
What problem does this paper attempt to address?