Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

Nouar AlDahoul,Myles Joshua Toledo Tan,Harishwar Reddy Kasireddy,Yasir Zaki
2024-11-26
Abstract:The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives and false negatives. Therefore, more sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship to build a more efficient censorship system. In this paper, we evaluate existing LLM-based content moderation solutions such as OpenAI moderation model and Llama-Guard3 and study their capabilities to detect sensitive contents. Additionally, we explore recent LLMs such as GPT, Gemini, and Llama in identifying inappropriate contents across media outlets. Various textual and visual datasets like X tweets, Amazon reviews, news articles, human photos, cartoons, sketches, and violence videos have been utilized for evaluation and comparison. The results demonstrate that LLMs outperform traditional techniques by achieving higher accuracy and lower false positive and false negative rates. This highlights the potential to integrate LLMs into websites, social media platforms, and video-sharing services for regulatory and content moderation purposes.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenges faced by current content moderation systems when detecting and filtering inappropriate content such as hate speech, harassment, sexual content, violence, nudity, etc. Specifically, the authors evaluate the capabilities of existing large - language models (LLMs) in detecting sensitive content in text, images, and videos, and explore whether these models can overcome the limitations of existing content moderation solutions. #### Main problems include: 1. **Deficiencies of existing content moderation systems**: - Existing methods have limitations in high - precision detection and reducing false positive and false negative rates. - More complex algorithms are required to understand the context of text and images to improve the efficiency and accuracy of media content review. 2. **The need for multi - modal content moderation**: - Current content moderation mainly focuses on text, while the detection of inappropriate content in images and videos has been relatively less studied. - A comprehensive system is needed to handle sensitive content in text, images, and videos. 3. **Application of large - scale datasets**: - Use various text and visual datasets (such as Twitter, Amazon reviews, news articles, human photos, cartoons, sketches, violent videos, etc.) to evaluate and compare the performance of different models. 4. **Exploration of new technical means**: - Explore the potential of the latest large - language models (such as OpenAI's GPT - 4o, Google's Gemini 1.5, Meta's Llama - 3, etc.) in content moderation. - Combine computer vision techniques to evaluate the LLMs' ability to detect sensitive content in images and videos. #### Specific goals of the paper: - **Evaluate existing content moderation models**: Such as the performance of OpenAI moderation model and Llama - Guard - 3 in detecting inappropriate text and images. - **Explore the potential of LLMs**: Study the ability of general - purpose LLMs (such as Gemini 1.5, GPT - 4o, Llama - 3) to identify inappropriate content in texts such as tweets, comments, articles, etc. - **Demonstrate the visual capabilities of LLMs**: Apply LLMs to image and video content review tasks, such as detecting content containing nudity, pornography, violence, child abuse, alcohol and drug abuse, etc. - **Propose improvement plans**: Based on the experimental results, propose suggestions on how to use LLMs to improve content moderation systems to improve accuracy and reduce false positives and false negatives. In summary, this paper is committed to addressing the challenges in current multi - modal content moderation by evaluating and improving LLMs - based content moderation systems, thereby establishing a more efficient and accurate content review mechanism.