Multimodal Emotion Recognition for Children with Autism Spectrum Disorder in Social Interaction

Jingjing Liu,Zhiyong Wang,Wei Nie,Jia Zeng,Bingrui Zhou,Jingxin Deng,Huiping Li,Qiong Xu,Xiu Xu,Honghai Liu
DOI: https://doi.org/10.1080/10447318.2023.2232194
IF: 4.92
2024-01-01
International Journal of Human-Computer Interaction
Abstract:Autism Spectrum Disorders (ASD) remain a healthcare challenge and gain considerable attention due to the increasing prevalence rates and insupportable burden on families and society. It is noted that the recognition of children's emotional states plays an important role in the evaluation and intervention process of ASD. In this paper, we aim to address the problem of automatic recognition of the emotional states of ASD children in social interactive scenarios. Since the child can be unconstrained in realistic scenarios, the face occlusion under pose variations and uncertain backgrounds become challenges of this task. To tackle this problem, we employ both facial expressions as well as body poses as cues to recognize the emotional states while most traditional methods only leverage the former. Firstly for the facial information, spatial features are extracted through convolutional neural networks followed by a temporal transformer to extract temporal information. Then for the body pose information, graph convolutional networks combined with the self-attention part are used to represent spatial features and temporal convolutional layers for temporal counterparts. Finally, different multimodal fusion ways are explored to generate final recognition results. We evaluate this method on a challenging database collected by us in real-world child-clinician interactive scenarios and the proposed method achieved significantly better results than baselines using only facial information. Thus it is suggested that there is a potential to assist in clinical practice by providing the recognized emotion as feedback.
What problem does this paper attempt to address?