Abstract:The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives and false negatives. Therefore, more sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship to build a more efficient censorship system. In this paper, we evaluate existing LLM-based content moderation solutions such as OpenAI moderation model and Llama-Guard3 and study their capabilities to detect sensitive contents. Additionally, we explore recent LLMs such as GPT, Gemini, and Llama in identifying inappropriate contents across media outlets. Various textual and visual datasets like X tweets, Amazon reviews, news articles, human photos, cartoons, sketches, and violence videos have been utilized for evaluation and comparison. The results demonstrate that LLMs outperform traditional techniques by achieving higher accuracy and lower false positive and false negative rates. This highlights the potential to integrate LLMs into websites, social media platforms, and video-sharing services for regulatory and content moderation purposes.

SLM-Mod: Small Language Models Surpass LLMs at Content Moderation

Watch Your Language: Investigating Content Moderation with Large Language Models

Large Language Models for Automatic Detection of Sensitive Topics

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

Sentiment Analysis in the Era of Large Language Models: A Reality Check

MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms

Supporting Human Raters with the Detection of Harmful Content using Large Language Models

SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models

Can Language Model Moderators Improve the Health of Online Discourse?

Small Language Models: Survey, Measurements, and Insights

Scaling Up LLM Reviews for Google Ads Content Moderation

Multilingual Content Moderation: A Case Study on Reddit

Large Language Models as Subpopulation Representative Models: A Review

Can AI Moderate Online Communities?

LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models.

Can Large Language Models Transform Computational Social Science?

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning

Integrating Content Moderation Systems with Large Language Models

Joint Repetition Suppression and Content Moderation of Large Language Models