Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish

Dalia A. Rodríguez,Julia Diaz-Escobar,Arnoldo Díaz-Ramírez,Leonardo Trujillo
DOI: https://doi.org/10.1007/s13278-023-01128-2
2023-09-30
Social Network Analysis and Mining
Abstract:Violence against women is a major social issue. One in every three women worldwide has been subjected to physical or sexual violence. The pervasive violence against women in the physical world, the ever-growing presence of social media in our lives, and its lack of content moderation have led to an influx of misogynistic social media content. We contribute to preventing violence against women by introducing a BERT architecture with domain-adaptive pre-training to detect misogynistic tweets in Spanish automatically. We used the IbeEval 2018 Spanish dataset for automatic misogyny identification, obtaining an accuracy of 84.60%, precision of 79.64%, recall at 86.70%, and F-1 score of 83.02%, outperforming the state of the art. We also conducted a manual error analysis and discovered 469 mislabeled tweets and a misogynistic bias in the IbeEval 2018 Spanish dataset. Our debiased model outperformed the current literature on automatic misogyny detection with an accuracy of 84.35%, precision of 84.64%, recall of 83.93%, and F-1 score of 84.28%. Lastly, we addressed the need for misogyny detection on other social media by experimenting with a manually curated and labeled dataset of Facebook comments in Spanish for automatic misogyny detection. We obtained excellent results with an accuracy of 87.85%. Misogyny is a complex social issue, so an interdisciplinary approach might benefit future models for automatically detecting misogyny.
What problem does this paper attempt to address?