Level of agreement between emotions generated by Artificial Intelligence and human evaluation: a methodological proposal

Miguel Carrasco,Cesar Gonzalez-Martin,Sonia Navajas-Torrente,Raul Dastres
2024-10-11
Abstract:Images are capable of conveying emotions, but emotional experience is highly subjective. Advances in artificial intelligence have enabled the generation of images based on emotional descriptions. However, the level of agreement between the generative images and human emotional responses has not yet been evaluated. To address this, 20 artistic landscapes were generated using StyleGAN2-ADA. Four variants evoking positive emotions (contentment, amusement) and negative emotions (fear, sadness) were created for each image, resulting in 80 pictures. An online questionnaire was designed using this material, in which 61 observers classified the generated images. Statistical analyses were performed on the collected data to determine the level of agreement among participants, between the observer's responses, and the AI-generated emotions. A generally good level of agreement was found, with better results for negative emotions. However, the study confirms the subjectivity inherent in emotional evaluation.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of evaluating the level of consistency between images generated by generative artificial intelligence (AI) and human emotional responses. Specifically, the researchers used StyleGAN2-ADA to generate 20 artistic landscape paintings, each with four variants corresponding to four emotions (two positive emotions: satisfaction, amusement; two negative emotions: fear, sadness), generating a total of 80 images. Then, through an online questionnaire, 61 observers classified these generated images to determine the consistency of opinions among participants and the consistency between AI-generated emotions and human perception. ### Research Background 1. **Relationship between Images and Emotions**: - Images can convey emotions, but emotional experiences are highly subjective. - With the development of artificial intelligence, techniques for generating images based on emotional descriptions have made progress, but the consistency between these generated images and human emotional responses has not yet been evaluated. 2. **Challenges of Emotional Evaluation**: - The subjectivity of emotional responses makes it difficult to achieve consistency in emotional evaluations among individuals. - Factors such as the social and cultural background and experiences of observers can also affect emotional experiences. 3. **Emotional Models**: - Commonly used emotional models in psychology include discrete emotional models (such as those proposed by Ekman or Mikels) and multidimensional emotional models (such as those proposed by Wood et al.). - Multidimensional emotional models typically divide emotions into three dimensions: valence, arousal, and dominance, and emotions are often dichotomized as positive or negative. ### Research Methods 1. **Data Preparation**: - Using the Artemis dataset, which contains 80,031 records, each record includes the style of the artwork, the image, the annotator's stated emotion, interpretation, and the number of annotators involved in the work. - Records containing only the landscape painting category were selected to reduce the degree of concreteness and identify stimulus and contextual information. - The dominant emotion of each work was determined by voting frequency, and 9,750 valid records were finally screened out. 2. **Modeling**: - StyleGAN2-ADA was used to generate 20 landscape paintings, each with four variants corresponding to four emotions (satisfaction, amusement, fear, sadness). - The generated 80 images were used for subsequent evaluation. 3. **Evaluation**: - An online questionnaire was designed to classify the generated images by 61 observers. - Consistency of opinions among participants and the consistency between AI-generated emotions and human perception were evaluated through statistical analysis (such as Krippendorff’s Alpha, precision, recall, F1-Score, Fisher’s test, and Jaccard index). ### Main Results - The study found that there was generally good consistency among participants, especially for the evaluation of negative emotions. - The study confirmed the subjectivity of emotional evaluation and emphasized the need for further research. ### Conclusion This study proposed a method to evaluate the consistency between images generated by generative AI and human emotional responses and demonstrated the effectiveness of this method through experiments. The results of the study have important implications for the application of generative AI in emotional expression and understanding.