Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps

Chuanbo Hu,Bin Liu,Minglei Yin,Yilu Zhou,Xin Li

2024-07-09

Abstract:Mobile applications (Apps) could expose children to inappropriate themes such as sexual content, violence, and drug use. Maturity rating offers a quick and effective method for potential users, particularly guardians, to assess the maturity levels of apps. Determining accurate maturity ratings for mobile apps is essential to protect children's health in today's saturated digital marketplace. Existing approaches to maturity rating are either inaccurate (e.g., self-reported rating by developers) or costly (e.g., manual examination). In the literature, there are few text-mining-based approaches to maturity rating. However, each app typically involves multiple modalities, namely app description in the text, and screenshots in the image. In this paper, we present a framework for determining app maturity levels that utilize multimodal large language models (MLLMs), specifically ChatGPT-4 Vision. Powered by Chain-of-Thought (CoT) reasoning, our framework systematically leverages ChatGPT-4 to process multimodal app data (i.e., textual descriptions and screenshots) and guide the MLLM model through a step-by-step reasoning pathway from initial content analysis to final maturity rating determination. As a result, through explicitly incorporating CoT reasoning, our framework enables ChatGPT to understand better and apply maturity policies to facilitate maturity rating. Experimental results indicate that the proposed method outperforms all baseline models and other fusion strategies.

Computers and Society,Artificial Intelligence

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issue of inappropriate content in mobile applications (Apps) that is not suitable for children, particularly content involving sexual material, violence, gambling, or drug use. Such content can negatively impact children's psychological and behavioral development. Current maturity rating systems are either inaccurate (e.g., developer self-reported ratings) or costly (e.g., manual review). To solve this problem, the paper proposes a new framework utilizing Multimodal Large Language Models (MLLMs), specifically using ChatGPT-4 Vision (GPT-4V) to determine the maturity rating of Apps. ### Main contributions of the paper 1. **First systematic study**: The paper conducts the first systematic study using Multimodal Large Language Models for App maturity rating. 2. **Chain-of-Thought prompt design**: It designs Chain-of-Thought (CoT) prompts to guide MLLM in logically deriving the maturity level of an App. 3. **Extensive experimental validation**: Extensive experiments were conducted on a dataset collected from the App Store to validate the effectiveness of the proposed method and demonstrate its advantages over baseline models. ### Key technical points of the paper 1. **Multimodal data processing**: The paper utilizes two modalities of data, text descriptions and screenshots, to improve the accuracy of maturity ratings. 2. **Chain-of-Thought reasoning**: Through CoT reasoning, GPT-4V can analyze the content of an App step by step, thereby determining the maturity level more accurately. 3. **Fusion strategies**: The paper also explores different multimodal fusion strategies to further enhance the accuracy of the ratings. Through the above methods, the new framework proposed in the paper significantly outperforms existing baseline models in predicting App maturity.

Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps

Identifying Child Users Via Touchscreen Interactions

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Exploring Parent-Child Perceptions on Safety in Generative AI: Concerns, Mitigation Strategies, and Design Implications

ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

MaTCR: Modality-Aligned Thought Chain Reasoning for Multimodal Task-Oriented Dialogue Generation

The critical need for expert oversight of ChatGPT: Prompt engineering for safeguarding child healthcare information

Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models

Multi role ChatGPT framework for transforming medical data analysis

Are Mobile Advertisements in Compliance with App's Age Group?

Safe Generative Chats in a WhatsApp Intelligent Tutoring System

Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity

The Application and Challenges of ChatGPT in Preschool Education

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms

Multimodal PEAR Chain-of-Thought Reasoning for Multimodal Sentiment Analysis

Exploring ChatGPT App Ecosystem: Distribution, Deployment and Security

Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

ChatGPT: The End of Online Exam Integrity?

Examining Multimodal Gender and Content Bias in ChatGPT-4o

MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps