Facial action units guided graph representation learning for multimodal depression detection

Changzeng Fu,Fengkui Qian,Yikai Su,Kaifeng Su,Siyang Song,Mingyue Niu,Jiaqi Shi,Zhigang Liu,Chaoran Liu,Carlos Toshinori Ishi,Hiroshi Ishiguro
DOI: https://doi.org/10.1016/j.neucom.2024.129106
IF: 6
2024-12-15
Neurocomputing
Abstract:Depression has become highly prevalent worldwide. Applying technological solutions to compensate for limited resources is crucial for enabling improved early detection and intervention of depressive emotions. Facial expressions and speech are often utilized to assess depression levels since they contain rich emotional cues. However, the inherent inter-individual variability presented in facial expressions and speech introduces noise and redundancy into these cues. Since Facial Action Units (FAUs) represent identity-free facial muscle movements, we proposed a novel FAU-Guided Fusion (FAU-GF) strategy that extracts less noisy and redundant emotional features from human facial and speech behaviors. Moreover, depressive symptoms manifest in a recurrent, episodic, and intermittent fashion, with disjointed temporal connections. To enhance temporal dependency extraction, we innovatively represent temporal events as a graph structure and designed a Feature-Aware Edge Updater (FAEU) to better model contextual dependencies by learning the edges. Subsequently, we adopted Graph Attention Networks (GAT) to update the nodes and edges. After that, a Temporal Attention (TA) module is constructed to selectively focus on informative affect dynamics for each node. Finally, a Depression Predictor is used to assess the depression severity level. The proposed model was evaluated on the AVEC 2014 and AVEC 2019 datasets. Experimental results demonstrate that our model outperforms existing methods.
computer science, artificial intelligence
What problem does this paper attempt to address?