Abstract:One trending application of LLM (large language model) is to use it for content moderation in online platforms. Most current studies on this application have focused on the metric of accuracy - the extent to which LLM makes correct decisions about content. This article argues that accuracy is insufficient and misleading, because it fails to grasp the distinction between easy cases and hard cases as well as the inevitable trade-offs in achieving higher accuracy. Closer examination reveals that content moderation is a constitutive part of platform governance, the key of which is to gain and enhance legitimacy. Instead of making moderation decisions correct, the chief goal of LLM is to make them legitimate. In this regard, this article proposes a paradigm shift from the single benchmark of accuracy towards a legitimacy-based framework of evaluating the performance of LLM moderators. The framework suggests that for easy cases, the key is to ensure accuracy, speed and transparency, while for hard cases, what matters is reasoned justification and user participation. Examined under this framework, LLM's real potential in moderation is not accuracy improvement. Rather, LLM can better contribute in four other aspects: to conduct screening of hard cases from easy cases, to provide quality explanations for moderation decisions, to assist human reviewers in getting more contextual information, and to facilitate user participation in a more interactive way. Using normative theories from law and social sciences to critically assess the new technological application, this article seeks to redefine LLM's role in content moderation and redirect relevant research in this field.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper attempts to address the issue that the application of large language models (LLMs) in content moderation should not only focus on accuracy but should also place greater emphasis on legitimacy. Specifically, the authors argue: 1. **Limitations of Accuracy**: Most current research primarily focuses on the accuracy of LLMs in content moderation, i.e., the extent to which LLMs can make correct moderation decisions. However, the authors believe that this single accuracy metric is limited and misleading. Content moderation is not just a technical issue but also involves balancing various rights and interests, which may be understood differently in different cultural contexts. 2. **Necessity of Legitimacy**: Content moderation is a crucial part of platform governance, with the core goal of gaining and enhancing legitimacy. Legitimacy includes not only accuracy but also transparency, speed, reasoning behind decisions, and procedural fairness. Therefore, the authors propose a shift from a single accuracy metric to a comprehensive evaluation framework based on legitimacy. 3. **Handling Different Case Types**: The authors distinguish between simple and complex cases and propose different legitimacy evaluation standards. For simple cases, the focus is on ensuring accuracy, speed, and transparency; for complex cases, the key is to provide reasonable explanations and user participation. 4. **True Potential of LLMs**: Under the new framework, the main advantages of LLMs are not in improving accuracy but in the following aspects: - Pre-screening complex cases - Providing high-quality decision explanations - Assisting human moderators in obtaining more contextual information - Facilitating more interactive user participation By repositioning the role of LLMs in content moderation, the authors hope to guide researchers and developers to shift their attention from merely pursuing accuracy to enhancing the overall legitimacy of platform governance. This not only helps to better utilize the technical advantages of LLMs but also more comprehensively addresses the complex challenges in content moderation.

Content Moderation by LLM: From Accuracy to Legitimacy

Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

Watch Your Language: Investigating Content Moderation with Large Language Models

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning

Legilimens: Practical and Unified Content Moderation for Large Language Model Services

Integrating Content Moderation Systems with Large Language Models

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

SLM-Mod: Small Language Models Surpass LLMs at Content Moderation

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms

Hate Personified: Investigating the role of LLMs in content moderation

LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback

Let Community Rules Be Reflected in Online Content Moderation

Operationalizing content moderation "accuracy" in the Digital Services Act

Large Language Models for Automatic Detection of Sensitive Topics

Algorithmic Arbitrariness in Content Moderation

Supporting Human Raters with the Detection of Harmful Content using Large Language Models

Auditing large language models: a three-layered approach

Humans or LLMs as the Judge? A Study on Judgement Biases

Navigating LLM Ethics: Advancements, Challenges, and Future Directions

Competent Third Parties and Content Moderation on Platforms: Potentials of Independent Decision-Making Bodies From A Governance Structure Perspective