Abstract:The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives and false negatives. Therefore, more sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship to build a more efficient censorship system. In this paper, we evaluate existing LLM-based content moderation solutions such as OpenAI moderation model and Llama-Guard3 and study their capabilities to detect sensitive contents. Additionally, we explore recent LLMs such as GPT, Gemini, and Llama in identifying inappropriate contents across media outlets. Various textual and visual datasets like X tweets, Amazon reviews, news articles, human photos, cartoons, sketches, and violence videos have been utilized for evaluation and comparison. The results demonstrate that LLMs outperform traditional techniques by achieving higher accuracy and lower false positive and false negative rates. This highlights the potential to integrate LLMs into websites, social media platforms, and video-sharing services for regulatory and content moderation purposes.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges faced by current content moderation systems when detecting and filtering inappropriate content such as hate speech, harassment, sexual content, violence, nudity, etc. Specifically, the authors evaluate the capabilities of existing large - language models (LLMs) in detecting sensitive content in text, images, and videos, and explore whether these models can overcome the limitations of existing content moderation solutions. #### Main problems include: 1. **Deficiencies of existing content moderation systems**: - Existing methods have limitations in high - precision detection and reducing false positive and false negative rates. - More complex algorithms are required to understand the context of text and images to improve the efficiency and accuracy of media content review. 2. **The need for multi - modal content moderation**: - Current content moderation mainly focuses on text, while the detection of inappropriate content in images and videos has been relatively less studied. - A comprehensive system is needed to handle sensitive content in text, images, and videos. 3. **Application of large - scale datasets**: - Use various text and visual datasets (such as Twitter, Amazon reviews, news articles, human photos, cartoons, sketches, violent videos, etc.) to evaluate and compare the performance of different models. 4. **Exploration of new technical means**: - Explore the potential of the latest large - language models (such as OpenAI's GPT - 4o, Google's Gemini 1.5, Meta's Llama - 3, etc.) in content moderation. - Combine computer vision techniques to evaluate the LLMs' ability to detect sensitive content in images and videos. #### Specific goals of the paper: - **Evaluate existing content moderation models**: Such as the performance of OpenAI moderation model and Llama - Guard - 3 in detecting inappropriate text and images. - **Explore the potential of LLMs**: Study the ability of general - purpose LLMs (such as Gemini 1.5, GPT - 4o, Llama - 3) to identify inappropriate content in texts such as tweets, comments, articles, etc. - **Demonstrate the visual capabilities of LLMs**: Apply LLMs to image and video content review tasks, such as detecting content containing nudity, pornography, violence, child abuse, alcohol and drug abuse, etc. - **Propose improvement plans**: Based on the experimental results, propose suggestions on how to use LLMs to improve content moderation systems to improve accuracy and reduce false positives and false negatives. In summary, this paper is committed to addressing the challenges in current multi - modal content moderation by evaluating and improving LLMs - based content moderation systems, thereby establishing a more efficient and accurate content review mechanism.

Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

Watch Your Language: Investigating Content Moderation with Large Language Models

Large Language Models for Automatic Detection of Sensitive Topics

Supporting Human Raters with the Detection of Harmful Content using Large Language Models

Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models

Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

Integrating Content Moderation Systems with Large Language Models

SLM-Mod: Small Language Models Surpass LLMs at Content Moderation

Content Moderation by LLM: From Accuracy to Legitimacy

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

Probing LLMs for hate speech detection: strengths and vulnerabilities

Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online

Hate Personified: Investigating the role of LLMs in content moderation

Guardians of Discourse: Evaluating LLMs on Multilingual Offensive Language Detection

Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

The Potential of Vision-Language Models for Content Moderation of Children's Videos

HateModerate: Testing Hate Speech Detectors against Content Moderation Policies

Detection and moderation of detrimental content on social media platforms: current status and future directions