Abstract:Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.

Unveiling the Potential of ChatGPT and YOLOv7 for Evaluating Children's Emotions Using Their Artistic Expressions

A Children's Psychological and Mental Health Detection Model by Drawing Analysis based on Computer Vision and Deep Learning

Emotional Facial Expression Detection using YOLOv8

EmoEden: Applying Generative Artificial Intelligence to Emotional Learning for Children with High-Function Autism

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study

Image Analysis through the lens of ChatGPT-4

Kids' Emotion Recognition Using Various Deep-Learning Models with Explainable AI

A Portrait of Emotion: Empowering Self-Expression through AI-Generated Art

Real-time emotion generation in human-robot dialogue using large language models

A Wide Evaluation of ChatGPT on Affective Computing Tasks

A pilot study of measuring emotional response and perception of LLM-generated questionnaire and human-generated questionnaires

Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

AutYOLO-ATT: an attention-based YOLOv8 algorithm for early autism diagnosis through facial expression recognition

Investigating Large Language Models' Perception of Emotion Using Appraisal Theory

An emotion analysis in learning environment based on theme-specified drawing by convolutional neural network

Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT

Leveraging Language Models for Emotion and Behavior Analysis in Education

Towards Interpretable Mental Health Analysis with Large Language Models

A Novel Active-Learning Based Emotion-Vision-Transformer Network for Expression Recognition

Fine-grained Affective Processing Capabilities Emerging from Large Language Models