Depression Severity Estimation from Multiple Modalities

Evgeny Stepanov,Stephane Lathuiliere,Shammur Absar Chowdhury,Arindam Ghosh,Radu-Laurentiu Vieriu,Nicu Sebe,Giuseppe Riccardi

DOI: https://doi.org/10.48550/arXiv.1711.06095

2017-11-10

Abstract:Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual features extracted from face) to design and develop automatic methods for the detection of depression. In psychology literature, the PHQ-8 questionnaire is well established as a tool for measuring the severity of depression. In this paper we aim to automatically predict the PHQ-8 scores from features extracted from the different modalities. We show that visual features extracted from facial landmarks obtain the best performance in terms of estimating the PHQ-8 results with a mean absolute error (MAE) of 4.66 on the development set. Behavioral characteristics from speech provide an MAE of 4.73. Language features yield a slightly higher MAE of 5.17. When switching to the test set, our Turn Features derived from audio transcriptions achieve the best performance, scoring an MAE of 4.11 (corresponding to an RMSE of 4.94), which makes our system the winner of the AVEC 2017 depression sub-challenge.

Computer Vision and Pattern Recognition,Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to automatically predict the severity of depression through multiple modalities (such as voice, language, and visual features). Specifically, the authors use data from different modalities to design and develop automatic methods to detect the presence of depression and its severity. In the psychological literature, the PHQ - 8 questionnaire is widely regarded as an effective tool for measuring the severity of depression. Therefore, the goal of this paper is to automatically predict the PHQ - 8 score from the features extracted from different modalities. The main contribution of the paper lies in exploring different data sources (such as audio, video, language, and behavioral cues) to predict the severity of depression, and studying different feature representations and modeling techniques corresponding to each modality to improve the performance of automatic prediction. Through these studies, the authors hope to develop technologies that can assist in the early detection and effective care of depression. In the experimental part, the authors describe in detail the methods of extracting features from voice, behavior, language, and visual features, and conduct regression experiments using machine - learning techniques such as Support Vector Regression (SVR) and Long - Short - Term Memory Network (LSTM). Finally, the authors' system achieved the best performance in the test set of the AVEC 2017 Depression Sub - Challenge, especially on the "Turn Features" based on audio transcription, achieving a mean absolute error (MAE) of 4.11 and a corresponding root - mean - square error (RMSE) of 4.94. In conclusion, this paper aims to provide an effective method for evaluating the severity of depression through the automatic analysis of multi - modal data, thereby providing technical support for the early detection and treatment of depression.

Depression Severity Estimation from Multiple Modalities

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Multi-Modal and Multi-Task Depression Detection with Sentiment Assistance

Hierarchical Attention Transfer Networks for Depression Assessment from Speech

Hybrid Network Feature Extraction for Depression Assessment from Speech

Dynamic Facial Features in Positive-Emotional Speech for Identification of Depressive Tendencies

Depression Scale Recognition from Audio, Visual and Text Analysis

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

Towards automatic text-based estimation of depression through symptom prediction

Multimodal Measurement of Depression Using Deep Learning Models

Unaligned Multimodal Sequences for Depression Assessment From Speech

The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level

Evaluating Acoustic and Linguistic Features of Detecting Depression Sub-Challenge Dataset

End-to-end multimodal system for depression detection from online recordings

A Multimodal Approach for Detection and Assessment of Depression Using Text, Audio and Video

Topic Modeling Based Multi-modal Depression Detection

Automatic Assessment of Depression from Speech and Behavioural Signals

Facial Geometry and Speech Analysis for Depression Detection