Unsupervised offensive speech detection for multimedia based on multilingual BERT

Ge Liu,Xiaona Yang,Xiayang Shi,Yinlin Li
DOI: https://doi.org/10.1504/ijsnet.2024.142516
2024-11-07
International Journal of Sensor Networks
Abstract:There is a significant amount of offensive speech in multimedia, which seriously negatively impacts social stability. With the proliferation of sensor-equipped devices contributing to social media data, detecting offensive speech within this vast dataset has emerged as a critical challenge. However, most existing methods have focused only on a few high-resource languages. This paper proposes a cross-lingual aggressive transfer learning method based on bidirectional encoder representations from transformers (BERT) for automatically detecting offensive speech in low-resource languages. Initially, we utilise the multilingual BERT model to learn the characteristics of aggressive speech from a high-resource language dataset to establish an initial model. Subsequently, based on the linguistic similarity between languages, this model is transferred to low-resource languages. Experimental results demonstrate that our method achieves higher detection accuracy in multiple languages including English, Danish, Arabic, Turkish, and Greek, particularly excelling in low-resource languages.
computer science, information systems,telecommunications
What problem does this paper attempt to address?