Abstract:The automated identification of toxicity in texts is a crucial area in text analysis since the social media world is replete with unfiltered content that ranges from mildly abusive to downright hateful. Researchers have found an unintended bias and unfairness caused by training datasets, which caused an inaccurate classification of toxic words in context. In this paper, several approaches for locating toxicity in texts are assessed and presented aiming to enhance the overall quality of text classification. General unsupervised methods were used depending on the state-of-art models and external embeddings to improve the accuracy while relieving bias and enhancing F1-score. Suggested approaches used a combination of long short-term memory (LSTM) deep learning model with Glove word embeddings and LSTM with word embeddings generated by the Bidirectional Encoder Representations from Transformers (BERT), respectively. These models were trained and tested on large secondary qualitative data containing a large number of comments classified as toxic or not. Results found that acceptable accuracy of 94% and an F1-score of 0.89 were achieved using LSTM with BERT word embeddings in the binary classification of comments (toxic and nontoxic). A combination of LSTM and BERT performed better than both LSTM unaccompanied and LSTM with Glove word embedding. This paper tries to solve the problem of classifying comments with high accuracy by pertaining models with larger corpora of text (high-quality word embedding) rather than the training data solely.

Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study

Protecting marginalized communities by mitigating discrimination in toxic language detection

Determination of toxic comments and unintended model bias minimization using Deep learning approach

SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms"

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

A Survey of Toxic Comment Classification Methods

Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers

On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection

Leveraging Large Language Models and Topic Modeling for Toxicity Classification

Purging the Poison: A Machine Learning Approach to Filtering Toxic Comments

Multitask CapsNet: an Imbalanced Data Deep Learning Method for Predicting Toxicants

Designing Toxic Content Classification for a Diversity of Perspectives

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

Deep learning for religious and continent-based toxic content detection and classification

Detecting and Reducing Bias in a High Stakes Domain

Handling Bias in Toxic Speech Detection: A Survey

A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph