Inculcating Context for Emoji Powered Bengali Hate Speech Detection using Extended Fuzzy SVM and Text Embedding Models

Sayani Ghosal,Amita Jain,Devendra Kumar Tayal,Varun G. Menon,Akshi Kumar
DOI: https://doi.org/10.1145/3589001
IF: 1.471
2023-03-27
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:The massive growth of social webs offer opportunities to communicate with diverse languages, unstructured text, informal posts, misspelled contents and emojis. Social media users feel comfortable to express their emotions specially emotions with high intensity (hate speech) in their mother tongue. Hate speech in any form targets groups and individuals that may trigger antisocial activities, hate crimes, and terrorist acts. Bengali social media users use Bengali for posting implicit or indirect hate text. Existing Bengali hate speech detection research considers explicit hate speech detection but in actual hate is expressed more in implicit way. In order to detect both implicit and explicit hate speech from low resource content, social webs need highly efficient automated tools. Researchers applied discriminative learning approaches (i.e. SVM, MLP, CNN) to distinguish hate text with only clear-cut outcomes in detecting direct hate speech. The proposed novel Bengali hate speech detection model considers two parallel approaches: (i) It applies extended fuzzy SVM classifier for class imbalanced dataset (FSVMCIL) and multilingual BERT (mBERT) text embedding model to detect first hate label; (ii) Morphological analysis method to detect implicit and explicit hate content with the hate similarity (HS) scheme for second hate label. Linking both labeling methods, this research extracts contextual Bengali hate speech from informal text. This novel HS method considers Word2Vec word embedding model and Bengali hate lexicon. It also considers emoji to text conversion for efficient contextual analysis. This study also conducts extensive experiments for various categories with the Bengali hate speech dataset. It also evaluates the proposed model performance considering weighted F1 score, precision, recall and accuracy parameters. Results reveal significant improvement in Bengali hate speech detection with 2.35% increase in F1- score and 9.11 % increase in accuracy.
computer science, artificial intelligence
What problem does this paper attempt to address?