DCNN and DNN Based Multi-Modal Depression Recognition.

Le Yang,Dongmei Jiang,Wenjing Han,Hichem Sahli
DOI: https://doi.org/10.1109/acii.2017.8273643
2017-01-01
Abstract:In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. For each modality, corresponding feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, which are then fed into a DNN to predict the PHQ-8 score. For multi-modal depression recognition, the predicted PHQ-8 scores from each modality are integrated in a DNN for the final prediction. In addition, we propose the Histogram of Displacement Range as a novel global visual descriptor to quantify the range and speed of the facial landmarks' displacements. Experiments have been carried out on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset for the Depression Sub-challenge of the Audio-Visual Emotion Challenge (AVEC 2016), results show that the proposed multi-modal depression recognition framework obtains very promising results on both the development set and test set, which outperforms the state-of-the-art results.
What problem does this paper attempt to address?