Multimodal Depression Detection Using a Deep Feature Fusion Network

Guangyao Sun,Shenghui Zhao,Bochao Zou,Yubo An
DOI: https://doi.org/10.1117/12.2662620
2022-01-01
Abstract:Currently, more and more people are suffering from depression with the increase of social pressure, which has become one of the most severe health issues worldwide.Therefore, timely diagonosis of depression is very important.In this paper, a deep feature fusion network is proposed for multimodal depression detection.Firstly, an unsupervised autoencoder based on transformer is applied to derive the sentence-level embedding for the frame-level audiovisual features; then a deep feature fusion network based on a cross-modal transformer is proposed to fuse the text, audio and video features.The experimental results show that the proposed method achieves superior performance compared to state-of-the-art methods on the English database DAIC-WOZ.
What problem does this paper attempt to address?