Cross-lingual offensive speech identification with transfer learning for low-resource languages

Xiayang Shi,Xinyi Liu,Chun Xu,Yuanyuan Huang,Fang Chen,Shaolin Zhu
DOI: https://doi.org/10.1016/j.compeleceng.2022.108005
2022-05-25
Abstract:Most of research on the identification of offensive speech on social media platforms exist in English and other rich languages. A series of recently proposed methods for detecting low-resource offensive languages require labeled data. In this work, we propose an unsupervised model that can detect offensive speech for low-resource languages. Our method does not depend on any labeled data of low-resource languages. In detail, we propose an agreement regularized training that combines adversarial learning and transfer learning. Augmenting low-resource training data with sample regeneration methods to maintain the performance of the trained offensive speech identification model from rich-resource to low-resource languages. Extensive experiments on four low-resource languages demonstrate that our model either is on par or outperforms the supervised methods, without employing any annotated data on real-world offensive speech detection tasks for low-resource languages.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?