Abstract:Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.

Improved Emotional Alignment of AI and Humans: Human Ratings of Emotions Expressed by Stable Diffusion v1, DALL-E 2, and DALL-E 3

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Emotional Images: Assessing Emotions in Images and Potential Biases in Generative Models

Identification and Description of Emotions by Current Large Language Models

Level of Agreement between Emotions Generated by Artificial Intelligence and Human Evaluation: A Methodological Proposal

A Portrait of Emotion: Empowering Self-Expression through AI-Generated Art

EmoEden: Applying Generative Artificial Intelligence to Emotional Learning for Children with High-Function Autism

Survey of Research on End-to-End Emotional Dialogue Generation

Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

Emotionally Enriched Feedback via Generative AI

The pursuit of happiness: the power and influence of AI teammate emotion in human-AI teamwork

Can Generative Agents Predict Emotion?

Generative Technology for Human Emotion Recognition: A Scope Review

Emotional Conveyance Analysis of Artificial Intelligence Painting

Comparing emotions in ChatGPT answers and human answers to the coding questions on Stack Overflow

Decoding emotional responses to AI-generated architectural imagery

Human-like Affective Cognition in Foundation Models

Emotional Artificial Intelligence as a Tool for Human-Machine Interaction

Real-time emotion generation in human-robot dialogue using large language models

Assessing Emotion and Sensitivity of AI Artwork