Abusive Bangla comments detection on Facebook using transformer-based deep learning models

Tanjim Taharat Aurpa,Rifat Sadik,Md Shoaib Ahmed
DOI: https://doi.org/10.1007/s13278-021-00852-x
2021-12-29
Social Network Analysis and Mining
Abstract:In the era of social networking platforms, user-generated content is flooding every second on online social media platforms like Facebook. So observing and identifying many contents, including threats and sexual harassment, are more accessible than traditional media. Online content with extreme toxicity can lead to online harassment, profanity, personal attacks, and bullying acts. As Bangla is the seventh most spoken language worldwide, the utilization of Bangla language in Facebook has raised current times. The use of abusive comments on Facebook with Bangla also has increased alarmingly, but the research regarding this is very low. In this research work, we concentrate on identifying abusive comments of Bangla language in social media (Facebook) that can filter out at the primitive stage of social media’s affixing. To classify abusive comments swiftly and precisely, we apply transformer-based deep neural network models. We employ pre-training language architectures, BERT (Bidirectional Encoder Representations from Transformers) and ELECTRA (Efficiency Learning an Encoder that Classifies Token Replacements Accurately). We have conducted this work with a novel dataset comprises 44,001 comments from multitudinous Facebook posts. In this classification process, we have exhibited an average accuracy, precision, recall, and f1-score to evaluate our proposed models. The outcomes have brought a percipience of our applied BERT and ELECTRA architecture that performs notably with 85.00% and 84.92% test accuracy, respectively.
What problem does this paper attempt to address?