Abstract:Content moderation typically combines the efforts of human moderators and machine learning <a class="link-external link-http" href="http://models.However" rel="external noopener nofollow">this http URL</a>, these systems often rely on data where significant disagreement occurs during moderation, reflecting the subjective nature of toxicity <a class="link-external link-http" href="http://perception.Rather" rel="external noopener nofollow">this http URL</a> than dismissing this disagreement as noise, we interpret it as a valuable signal that highlights the inherent ambiguity of the content,an insight missed when only the majority label is <a class="link-external link-http" href="http://considered.In" rel="external noopener nofollow">this http URL</a> this work, we introduce a novel content moderation framework that emphasizes the importance of capturing annotation disagreement. Our approach uses multitask learning, where toxicity classification serves as the primary task and annotation disagreement is addressed as an auxiliary <a class="link-external link-http" href="http://task.Additionally" rel="external noopener nofollow">this http URL</a>, we leverage uncertainty estimation techniques, specifically Conformal Prediction, to account for both the ambiguity in comment annotations and the model's inherent uncertainty in predicting toxicity and <a class="link-external link-http" href="http://disagreement.The" rel="external noopener nofollow">this http URL</a> framework also allows moderators to adjust thresholds for annotation disagreement, offering flexibility in determining when ambiguity should trigger a <a class="link-external link-http" href="http://review.We" rel="external noopener nofollow">this http URL</a> demonstrate that our joint approach enhances model performance, calibration, and uncertainty estimation, while offering greater parameter efficiency and improving the review process in comparison to single-task methods.

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity

A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study

Toxicity Detection: Does Context Really Matter?

Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection

Leveraging Large Language Models and Topic Modeling for Toxicity Classification

A Survey of Toxic Comment Classification Methods

Impact of Sentiment Detection to Recognize Toxic and Subversive Online Comments

Reading Between the Demographic Lines: Resolving Sources of Bias in Toxicity Classifiers

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms"

Purging the Poison: A Machine Learning Approach to Filtering Toxic Comments

Analyzing Toxicity in Deep Conversations: A Reddit Case Study

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

A multitask learning framework for leveraging subjectivity of annotators to identify misogyny

How We Define Harm Impacts Data Annotations: Explaining How Annotators Distinguish Hateful, Offensive, and Toxic Comments

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language

RECAST: Enabling User Recourse and Interpretability of Toxicity Detection Models with Interactive Visualization

Toxic Comments Hunter : Score Severity of Toxic Comments