Multimodal Depression Detection based on Factorized Representation

Guanhe Huang,Honghai Liu,Wenchao Shen,Heli Lu,Feihu Hu,Jing Li
DOI: https://doi.org/10.1109/HDIS56859.2022.9991717
2022-12-10
Abstract:Untreated depression increases the chance of risky behavior, including suicide. However, there is lack of treatment since traditional depression diagnosis can be time-consuming and expensive. Recently, a growing body of evidence suggests that facial motions and language usage are significantly different between depression patients and healthy persons. In this paper, we devise a novel auto-encoder framework with multimodal factorization technique for depression detection based on facial images and the transcribed texts, aiming to eliminate redundancies and focus on key factors in the visual and textual modality. It consists of three stages, i.e., feature extraction and memory-based modality fusion, multimodal factorization, and reconstruction and prediction. Firstly, high-level features are extracted from facial images and transcribed texts by ResNet 50 and BERT, respectively. Meanwhile, they are fused by memory fusion network to obtain cross-modal features. Then, multimodal factorization takes the above three kinds of features to predict the depression severity and jointly reconstructs the single-modal input. We conduct experiments and ablation studies on a self-collected Chinese depression detection dataset to prove the effectiveness and robustness of our method.
Computer Science
What problem does this paper attempt to address?