Abstract:Depression is one of the common mental diseases. Patients with depression often have depressed moods such as sadness, guilty, low self-esteem, loss of interest, hypofunction and so on. They suffer from serious emotional problems, unexplained suffering, which has caused enormous losses to individuals, families and society. According to the World Health Organization, there are aproximately 322 million people suffering from depression in the whole world in 2017. While there are about 54 million depressive patients in China. Depression can be cured effciently. However, due to the complexity of the pathogenesis of depression, clinical diagnosis is accompanied with many difficulties. Firstly, the mental disease, especially depression, are not getting enough attention and even being misinterpreted by other people. Secondly, the depression patients are less willing to ask for help. Thirdly, it is hard to select and dignose the potential depression patients precisely, as well as there are limited medical resource for depression diagnosis. It is necessary to find a more convenient, objective and efficient way to assist the fast identification of depression. As a relatively objective and easily accessible variable, speech has its potential value. The speech of patient is easy to acquire, and also, it has been proved that the sound of depressed patients have special charcteristics such as slow speech rate, lack of cadence and so on. The purpose of this paper is to explore the relationship between speech and depression by establishing classification models of voice feature and depression prediction. In this research, 3(emotion mood: positive, neutral, negative)×3(task type: question answering, text reading, picture description) experimental design was employed, and the voice data was collected from the speech of individuals recorded during different tasks. 103 participants were inculded in this study, including 45 depression patients (age: 23.8–44.6, M =34.2, SD =10.4, males=22, females=23) and 58 healthy ones (age: 20.1–41.7, M =30.9, SD =10.8, males=27, females=31). The former were recruited in the hospital in Beijing Anding Hospital and Huilongguan Hospital, while the latter were recruited by advertisement. All of them were diagnosed by specialist with DSM-IV and MINI interview. All participants did not have substance abuse, substance dependence, personality disorders and other mental diseases, no serious physical illness or suicidal behavior. The education level of subjects are all above the elementary school. 988 Voice features were extracted from the speech data using open SMILE software. Logistic regression, a machine learning method, was used to train the predicting models. Results showed that the precision rate of predicting can reach to 82.9%. Based on machine learning methods, this paper employed voice features to establish predicting models of depression. Results show the speech of depression patients has certain predicting effect, which paves the way for the further identification of depression in a more thorough way.

Depression recognition using voice-based pre-training model

[A research on depression recognition based on voice pre-training model]

Automatic Assessment of Depression from Speech Via a Hierarchical Attention Transfer Network and Attention Autoencoders

Hybrid Network Feature Extraction for Depression Assessment from Speech

Hierarchical Attention Transfer Networks for Depression Assessment from Speech

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

Deep learning for Depression Recognition from Speech

Depression Recognition Based on Speech Analysis

A Convenient and Low-Cost Model of Depression Screening and Early Warning Based on Voice Data Using for Public Mental Health

Multi-feature deep supervised voiceprint adversarial network for depression recognition from speech

A deep learning-based model for detecting depression in senior population

Depression recognition base on acoustic speech model of Multi-task emotional stimulus

Re-examining the Robustness of Voice Features in Predicting Depression: Compared with Baseline of Confounders

Fast and accurate assessment of depression based on voice acoustic features: a cross-sectional and longitudinal study

Attention-Based Acoustic Feature Fusion Network for Depression Detection

2-level hierarchical depression recognition method based on task-stimulated and integrated speech features

Automatic Detection of Depression from Stratified Samples of Audio Data

Deep learning for depression recognition with audiovisual cues: A review

Fusing features of speech for depression classification based on higher-order spectral analysis

Automatic recognition of depression based on audio and video: A review

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech