Depression Severity Estimation from Multiple Modalities

Evgeny Stepanov,Stephane Lathuiliere,Shammur Absar Chowdhury,Arindam Ghosh,Radu-Laurentiu Vieriu,Nicu Sebe,Giuseppe Riccardi
DOI: https://doi.org/10.48550/arXiv.1711.06095
2017-11-10
Abstract:Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual features extracted from face) to design and develop automatic methods for the detection of depression. In psychology literature, the PHQ-8 questionnaire is well established as a tool for measuring the severity of depression. In this paper we aim to automatically predict the PHQ-8 scores from features extracted from the different modalities. We show that visual features extracted from facial landmarks obtain the best performance in terms of estimating the PHQ-8 results with a mean absolute error (MAE) of 4.66 on the development set. Behavioral characteristics from speech provide an MAE of 4.73. Language features yield a slightly higher MAE of 5.17. When switching to the test set, our Turn Features derived from audio transcriptions achieve the best performance, scoring an MAE of 4.11 (corresponding to an RMSE of 4.94), which makes our system the winner of the AVEC 2017 depression sub-challenge.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically predict the severity of depression through multiple modalities (such as voice, language, and visual features). Specifically, the authors use data from different modalities to design and develop automatic methods to detect the presence of depression and its severity. In the psychological literature, the PHQ - 8 questionnaire is widely regarded as an effective tool for measuring the severity of depression. Therefore, the goal of this paper is to automatically predict the PHQ - 8 score from the features extracted from different modalities. The main contribution of the paper lies in exploring different data sources (such as audio, video, language, and behavioral cues) to predict the severity of depression, and studying different feature representations and modeling techniques corresponding to each modality to improve the performance of automatic prediction. Through these studies, the authors hope to develop technologies that can assist in the early detection and effective care of depression. In the experimental part, the authors describe in detail the methods of extracting features from voice, behavior, language, and visual features, and conduct regression experiments using machine - learning techniques such as Support Vector Regression (SVR) and Long - Short - Term Memory Network (LSTM). Finally, the authors' system achieved the best performance in the test set of the AVEC 2017 Depression Sub - Challenge, especially on the "Turn Features" based on audio transcription, achieving a mean absolute error (MAE) of 4.11 and a corresponding root - mean - square error (RMSE) of 4.94. In conclusion, this paper aims to provide an effective method for evaluating the severity of depression through the automatic analysis of multi - modal data, thereby providing technical support for the early detection and treatment of depression.