GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels

Severi Santavirta,Yuhang Wu,Lauri Nummenmaa
DOI: https://doi.org/10.1101/2024.08.20.608741
2024-08-21
Abstract:Humans navigate the social world by rapidly perceiving social features from other people and their interaction. Recently, large-language models (LLMs) have achieved high-level visual capabilities for detailed object and scene content recognition and description. This raises the question whether LLMs can perceive nuanced and tacit social information form images and videos, and whether the high-dimensional perceptual structure aligns with that of humans. We collected social perceptual evaluations for 138 social features from GPT-4V for images (N=468) and videos (N=234) that are derived from social movie scenes. These evaluations were compared with human evaluations (N=2254). The comparisons established that GPT-4V can achieve human-like social perceptual capabilities at the level of individual features as well as at the level of high-dimensional perceptual representations. We also modelled hemodynamic responses (N=97) to viewing socioemotional movie clips with feature annotations by human observers and GPT-4V. These results demonstrated that GPT-4V can also reproduce the social perceptual space at the neural level highly similar to reference human evaluations. These human-like social perceptual capabilities of LLMs could have wide range of real-life applications ranging from health care to business and would open exciting new avenues for behavioural and psychological research.
Neuroscience
What problem does this paper attempt to address?