Embracing naturalistic paradigms: substituting GPT predictions for human judgments

Xuan Yang,Christian O'Reilly,Svetlana V Shinkareva
DOI: https://doi.org/10.1101/2024.06.17.599327
2024-06-22
Abstract:Naturalistic paradigms can assure ecological validity and yield novel insights in psychology and neuroscience. However, using behavioral experiments to obtain the human ratings necessary to analyze data collected with these paradigms is usually costly and time-consuming. Large language models like GPT have great potential for predicting human-like behavioral judgments. The current study evaluates the performance of GPT as a substitute for human judgments for affective dynamics in narratives. Our results revealed that GPT's inference of hedonic valence dynamics is highly correlated with human affective perception. Moreover, the inferred neural activity based on GPT derived valence ratings is similar to inferred neural activity based on human judgments, suggesting the potential of using GPT's prediction as a reliable substitute for human judgments.
Neuroscience
What problem does this paper attempt to address?
This paper discusses the problem of obtaining human behavior evaluation data which is both time-consuming and expensive in psychology and neuroscience research using naturalistic paradigms. The study proposes the use of large-scale language models, such as GPT, to predict human affective dynamics as an alternative. Specifically, they evaluate the correlation between GPT's predictions of hedonic valence dynamics in narratives and human affect perception, and validate the reliability of GPT's predictions using fMRI data. The study finds a high correlation between GPT's inference of hedonic valence in narratives and human ratings, and the neural activity based on GPT's inference is similar to that based on human judgments. This suggests that GPT predictions can serve as reliable substitutes for human judgments. By comparing GPT, human ratings, and lexicon-based SCOPE values, the study shows that GPT performs better than lexicon-based predictions in certain cases and performs better among a certain number of human raters. The paper also describes how to conduct experiments using GPT, including segmenting narratives, obtaining human behavioral experimental data, querying using GPT, and handling potential errors. Furthermore, the application potential of GPT predictions in affective neuroscience is demonstrated through the analysis of the correlation between GPT predictions of hedonic valence and fMRI data. In conclusion, this research aims to address the challenges of obtaining human evaluation data in naturalistic studies and proposes the use of GPT predictions as an effective and cost-efficient alternative. The feasibility and effectiveness of this approach are demonstrated through empirical evidence.