Abstract:Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, administrative actions (community bans and quarantines) are often taken in reaction to media pressure following offensive discourse within a community spilling into the real world with serious consequences. In this paper, we investigate the feasibility of proactive moderation on Reddit -- i.e., proactively identifying communities at risk of committing offenses that previously resulted in bans for other communities. Proactive moderation strategies show promise for two reasons: (1) they have potential to narrow down the communities that administrators need to monitor for hateful content and (2) they give administrators a scientific rationale to back their administrative decisions and interventions. Our work shows that communities are constantly evolving in their user base and topics of discourse and that evolution into hateful or dangerous (i.e., considered bannable by Reddit administrators) communities can often be predicted months ahead of time. This makes proactive moderation feasible. Further, we leverage explainable machine learning to help identify the strongest predictors of evolution into dangerous communities. This provides administrators with insights into the characteristics of communities at risk becoming dangerous or hateful. Finally, we investigate, at scale, the impact of participation in hateful and dangerous subreddits and the effectiveness of community bans and quarantines on the behavior of members of these communities.

ModSandbox: Facilitating Online Community Moderation Through Error Prediction and Improvement of Automated Rules

Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support

Let Community Rules Be Reflected in Online Content Moderation

Venire: A Machine Learning-Guided Panel Review System for Community Content Moderation

Platform Governance with Algorithm-Based Content Moderation: An Empirical Study on Reddit

To Act or React: Investigating Proactive Strategies For Online Community Moderation

Post Guidance for Online Communities

Automated Content Moderation Increases Adherence to Community Guidelines

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

Why am I seeing this: Democratizing End User Auditing for Online Content Recommendations

Beyond Trial-and-Error: Predicting User Abandonment After a Moderation Intervention

The Unsung Heroes of Facebook Groups Moderation: A Case Study of Moderation Practices and Tools

Like trainer, like bot? Inheritance of bias in algorithmic content moderation

AppealMod: Inducing Friction to Reduce Moderator Workload of Handling User Appeals

ModSec-Learn: Boosting ModSecurity with Machine Learning

How Are ML-Based Online Content Moderation Systems Actually Used? Studying Community Size, Local Activity, and Disparate Treatment.

"There Has To Be a Lot That We're Missing": Moderating AI-Generated Content on Reddit

A Browser Extension for in-place Signaling and Assessment of Misinformation

Algorithmic content moderation: Technical and political challenges in the automation of platform governance

Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms

SoK: An Essential Guide For Using Malware Sandboxes In Security Applications: Challenges, Pitfalls, and Lessons Learned