Abstract:Depression is one of the common mental diseases. Patients with depression often have depressed moods such as sadness, guilty, low self-esteem, loss of interest, hypofunction and so on. They suffer from serious emotional problems, unexplained suffering, which has caused enormous losses to individuals, families and society. According to the World Health Organization, there are aproximately 322 million people suffering from depression in the whole world in 2017. While there are about 54 million depressive patients in China. Depression can be cured effciently. However, due to the complexity of the pathogenesis of depression, clinical diagnosis is accompanied with many difficulties. Firstly, the mental disease, especially depression, are not getting enough attention and even being misinterpreted by other people. Secondly, the depression patients are less willing to ask for help. Thirdly, it is hard to select and dignose the potential depression patients precisely, as well as there are limited medical resource for depression diagnosis. It is necessary to find a more convenient, objective and efficient way to assist the fast identification of depression. As a relatively objective and easily accessible variable, speech has its potential value. The speech of patient is easy to acquire, and also, it has been proved that the sound of depressed patients have special charcteristics such as slow speech rate, lack of cadence and so on. The purpose of this paper is to explore the relationship between speech and depression by establishing classification models of voice feature and depression prediction. In this research, 3(emotion mood: positive, neutral, negative)×3(task type: question answering, text reading, picture description) experimental design was employed, and the voice data was collected from the speech of individuals recorded during different tasks. 103 participants were inculded in this study, including 45 depression patients (age: 23.8–44.6, M =34.2, SD =10.4, males=22, females=23) and 58 healthy ones (age: 20.1–41.7, M =30.9, SD =10.8, males=27, females=31). The former were recruited in the hospital in Beijing Anding Hospital and Huilongguan Hospital, while the latter were recruited by advertisement. All of them were diagnosed by specialist with DSM-IV and MINI interview. All participants did not have substance abuse, substance dependence, personality disorders and other mental diseases, no serious physical illness or suicidal behavior. The education level of subjects are all above the elementary school. 988 Voice features were extracted from the speech data using open SMILE software. Logistic regression, a machine learning method, was used to train the predicting models. Results showed that the precision rate of predicting can reach to 82.9%. Based on machine learning methods, this paper employed voice features to establish predicting models of depression. Results show the speech of depression patients has certain predicting effect, which paves the way for the further identification of depression in a more thorough way.

Research on Depression Detection Algorithm Combine Acoustic Rhythm with Sparse Face Recognition

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection.

Dynamic Facial Features in Positive-Emotional Speech for Identification of Depressive Tendencies

Hybrid Network Feature Extraction for Depression Assessment from Speech

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Multi-Modal and Multi-Task Depression Detection with Sentiment Assistance

Fusing features of speech for depression classification based on higher-order spectral analysis

Attention-Based Acoustic Feature Fusion Network for Depression Detection

Detect Depression from Communication: How Computer Vision, Signal Processing, and Sentiment Analysis Join Forces

Depression Recognition Based on Speech Analysis

Depression Detection Based on Facial Expression, Audio and Gait

Depression recognition base on acoustic speech model of Multi-task emotional stimulus

Feature-level fusion approaches based on multimodal EEG data for depression recognition

Multimodal Sensing for Depression Risk Detection: Integrating Audio, Video, and Text Data

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech

Facial Geometry and Speech Analysis for Depression Detection

Robust discriminant feature extraction for automatic depression recognition

Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions

Catching Elusive Depression via Facial Micro-Expression Recognition

A Depression Detection Method Based on Multi-Modal Feature Fusion Using Cross-Attention

Depression Assessment Method: An EEG Emotion Recognition Framework Based on Spatiotemporal Neural Network