Abstract:Background: Social robots are becoming increasingly important as companions in our daily lives. Consequently, humans expect to interact with them using the same mental models applied to human-human interactions, including the use of cospeech gestures. Research efforts have been devoted to understanding users' needs and developing robot's behavioral models that can perceive the user state and properly plan a reaction. Despite the efforts made, some challenges regarding the effect of robot embodiment and behavior in the perception of emotions remain open. Objective: The aim of this study is dual. First, it aims to assess the role of the robot's cospeech gestures and embodiment in the user's perceived emotions in terms of valence (stimulus pleasantness), arousal (intensity of evoked emotion), and dominance (degree of control exerted by the stimulus). Second, it aims to evaluate the robot's accuracy in identifying positive, negative, and neutral emotions displayed by interacting humans using 3 supervised machine learning algorithms: support vector machine, random forest, and K-nearest neighbor. Methods: Pepper robot was used to elicit the 3 emotions in humans using a set of 60 images retrieved from a standardized database. In particular, 2 experimental conditions for emotion elicitation were performed with Pepper robot: with a static behavior or with a robot that expresses coherent (COH) cospeech behavior. Furthermore, to evaluate the role of the robot embodiment, the third elicitation was performed by asking the participant to interact with a PC, where a graphical interface showed the same images. Each participant was requested to undergo only 1 of the 3 experimental conditions. Results: A total of 60 participants were recruited for this study, 20 for each experimental condition for a total of 3600 interactions. The results showed significant differences (P<.05) in valence, arousal, and dominance when stimulated with the Pepper robot behaving COH with respect to the PC condition, thus underlying the importance of the robot's nonverbal communication and embodiment. A higher valence score was obtained for the elicitation of the robot (COH and robot with static behavior) with respect to the PC. For emotion recognition, the K-nearest neighbor classifiers achieved the best accuracy results. In particular, the COH modality achieved the highest level of accuracy (0.97) when compared with the static behavior and PC elicitations (0.88 and 0.94, respectively). Conclusions: The results suggest that the use of multimodal communication channels, such as cospeech and visual channels, as in the COH modality, may improve the recognition accuracy of the user's emotional state and can reinforce the perceived emotion. Future studies should investigate the effect of age, culture, and cognitive profile on the emotion perception and recognition going beyond the limitation of this work.

Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction

Multi-Modal Based Fuzzy Atmosfield in Human-Robot Interaction

Evaluation of Robot Emotion Expressions for Human–Robot Interaction

Robots’ “Woohoo” and “Argh” can Enhance Users’ Emotional and Social Perceptions: An Exploratory Study on Non-Lexical Vocalizations and Non-Linguistic Sounds

Human-Robot Emotional Interaction Model Based on Reinforcement Learning

Conveying Emotions to Robots through Touch and Sound

The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study

Estimating Emotional Intensity from Body Poses for Human-Robot Interaction

Emotion recognition by facial image acquisition: analysis and experimentation of solutions based on neural networks and robot humanoid Pepper

Improving Human-Robot Interaction by Enhancing NAO Robot Awareness of Human Facial Expression

A Multimodal Emotional Communication Based Humans-Robots Interaction System

Emotion recognition models for companion robots

Multi-Modal Hierarchical Empathetic Framework for Social Robots With Affective Body Control

UGotMe: An Embodied System for Affective Human-Robot Interaction

Affective Human-Robot Interaction with Multimodal Explanations

Contactless Interaction System Based on Facial Expression Recognition for Humanoid Piano Robot

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

A Facial Expression Emotion Recognition Based Human-robot Interaction System

Interactive Robot Learning for Multimodal Emotion Recognition.

Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives

Emotional Communication Robot Based on 3D Face Model and ASR Technology