Affective Human-Robot Interaction with Multimodal Explanations

Hongbo Zhu,Chuang Yu,Angelo Cangelosi
DOI: https://doi.org/10.1007/978-3-031-24667-8_22
2022-01-01
Abstract:Facial expressions are one of the most practical and straightforward ways to communicate emotions. Facial Expression Recognition has been used in lots of fields such as human behaviour understanding and health monitoring. Deep learning models can achieve excellent performance in facial expression recognition tasks. As these deep neural networks have very complex nonlinear structures, when the model makes a prediction, it is not easy for human users to understand what is the basis for the model's prediction. Specifically, we do not know which facial units contribute to the classification more or less. Developing affective computing models with more explainable and transparent feedback for human interactors is essential for a trustworthy human-robot interaction. Compared to "white-box" approaches, "black-box" approaches using deep neural networks, which have advantages in terms of overall accuracy but lack reliability and explainability. In this work, we introduce a multimodal affective human-robot interaction framework, with visual-based and verbal-based explanation, by Layer-Wise Relevance Propagation (LRP) and Local Interpretable Mode-Agnostic Explanation (LIME). The proposed framework has been tested on the KDEF dataset, and in human-robot interaction experiments with the Pepper robot. This experimental evaluation shows the benefits of linking deep learning emotion recognition systems with explainable strategies.
What problem does this paper attempt to address?