A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement

Guillermo Villate-Castillo,Javier Del Ser,Borja Sanz
2024-11-07
Abstract:Content moderation typically combines the efforts of human moderators and machine learning <a class="link-external link-http" href="http://models.However" rel="external noopener nofollow">this http URL</a>, these systems often rely on data where significant disagreement occurs during moderation, reflecting the subjective nature of toxicity <a class="link-external link-http" href="http://perception.Rather" rel="external noopener nofollow">this http URL</a> than dismissing this disagreement as noise, we interpret it as a valuable signal that highlights the inherent ambiguity of the content,an insight missed when only the majority label is <a class="link-external link-http" href="http://considered.In" rel="external noopener nofollow">this http URL</a> this work, we introduce a novel content moderation framework that emphasizes the importance of capturing annotation disagreement. Our approach uses multitask learning, where toxicity classification serves as the primary task and annotation disagreement is addressed as an auxiliary <a class="link-external link-http" href="http://task.Additionally" rel="external noopener nofollow">this http URL</a>, we leverage uncertainty estimation techniques, specifically Conformal Prediction, to account for both the ambiguity in comment annotations and the model's inherent uncertainty in predicting toxicity and <a class="link-external link-http" href="http://disagreement.The" rel="external noopener nofollow">this http URL</a> framework also allows moderators to adjust thresholds for annotation disagreement, offering flexibility in determining when ambiguity should trigger a <a class="link-external link-http" href="http://review.We" rel="external noopener nofollow">this http URL</a> demonstrate that our joint approach enhances model performance, calibration, and uncertainty estimation, while offering greater parameter efficiency and improving the review process in comparison to single-task methods.
Computation and Language,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?