Abstract:Ensuring content compliance with community guidelines is crucial for maintaining healthy online social environments. However, traditional human-based compliance checking struggles with scaling due to the increasing volume of user-generated content and a limited number of moderators. Recent advancements in Natural Language Understanding demonstrated by Large Language Models unlock new opportunities for automated content compliance verification. This work evaluates six AI-agents built on Open-LLMs for automated rule compliance checking in Decentralized Social Networks, a challenging environment due to heterogeneous community scopes and rules. Analyzing over 50,000 posts from hundreds of Mastodon servers, we find that AI-agents effectively detect non-compliant content, grasp linguistic subtleties, and adapt to diverse community contexts. Most agents also show high inter-rater reliability and consistency in score justification and suggestions for compliance. Human-based evaluation with domain experts confirmed the agents' reliability and usefulness, rendering them promising tools for semi-automated or human-in-the-loop content moderation systems.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges of content compliance checking in decentralized social media platforms. Specifically, the authors focus on how to ensure that user - generated content complies with community rules in order to maintain a healthy online social environment. The traditional method relying on manual review is difficult to scale effectively due to the increasing amount of user - generated content and the limited number of reviewers. Therefore, this paper explores the possibility of using AI agents based on large language models (LLMs) to automate the enforcement of community - rule compliance checking. #### Main problems include: 1. **Limitations of traditional methods**: - With the increase in user - generated content and the limitation in the number of reviewers, the traditional manual review method is difficult to handle large amounts of data. - The large amount of review work is likely to cause psychological stress to reviewers. 2. **The need for automated compliance checking**: - Automated tools can help reduce the workload of reviewers and improve review efficiency. - Existing automated tools such as Reddit's AutoMod are considered to have a gap from human expectations and are not suitable for completely replacing manual review. 3. **The application potential of large language models**: - Recent advances in natural - language - understanding technology have demonstrated the great potential of LLMs in text - processing tasks. - LLMs may be able to detect content that does not conform to community rules more effectively and adapt to different community contexts and rule sets. 4. **Unique challenges in decentralized social media**: - Decentralized social media (such as Mastodon) has a diverse range of communities and rules, making automatic compliance checking more challenging. - There are large differences in rules between different servers, requiring AI agents to have sufficient flexibility and adaptability. #### The specific research questions of the paper include: - **RQ1**: Can LLM agents perform compliance checking and propose review strategies in communities with public rules? - **RQ2**: Are there consistencies and differences in the behavior of different LLM agents when providing review strategies? - **RQ3**: How do domain experts view the review strategies of LLM agents? Do they meet human expectations? Through these questions, the authors hope to evaluate the performance of LLM agents in automatic content - compliance checking and explore the feasibility of using them as an auxiliary tool to support human reviewers.

Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance

Can We Trust AI Agents? An Experimental Study Towards Trustworthy LLM-Based Multi-Agent Systems for AI Ethics

Integrating Content Moderation Systems with Large Language Models

LLMs Among Us: Generative AI Participating in Digital Discourse

Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study

Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

Social Media Bot Policies: Evaluating Passive and Active Enforcement

Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs

Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models

Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents

Safeguarding AI Agents: Developing and Analyzing Safety Architectures

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Specification, Validation and Verification of Social, Legal, Ethical, Empathetic and Cultural Requirements for Autonomous Agents

Concept-Guided LLM Agents for Human-AI Safety Codesign

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

CodeAgent: Autonomous Communicative Agents for Code Review

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View